Tips for speeding up duplicate merging?
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
1. Export your library as Zotero RDF without notes or file attachments.
2. Close Zotero.
3. Back up your current Zotero library: https://www.zotero.org/support/zotero_data#backing_up_your_zotero_library
4. Delete your existing Zotero library. Locate the library as per instructions in above link and delete the zotero.sqlite file along with the "storage" folder.
5. Modify the RDF translator to not import tags. Find RDF.js file inside your zotero data directory in the "translators" folder. Open the RDF.js file in a text editor. Find line 1156 (should say "newItem.complete()"). Right _before_ this line, add "newItem.tags = [];"
6. Restart/Open Zotero. Delete the single item that should be in the library and empty trash.
7. Fill in your sync details in Zotero preferences, but _uncheck_ sync automatically.
8. Reset your library _to_ server under Preferences -> Sync -> Reset -> "Restore to Zotero Server" -> Reset...
9. Import your RDF file via Gear menu -> Import...
10. You can now enable automatic sync. It will help to sync periodically, instead of syncing large amount of changes at once.
11. Reset your translators via Preferences -> Advanced -> Files and Folders -> Reset Translators...
12. Restart Zotero.
(Edited to speed up resetting to server)
Steps 7 and 8 are not technically necessary if you're not interested in syncing at this time and even with the cleaned up library, the sync may be too big to go through smoothly. So if the reset fails, just remove your username and password and keep working. Just remember that if you ever want to sync again in the future, the first thing you need to do is a Reset --> Restore to Server. Large syncs will work better in future Zotero versions.
However, I'm having trouble getting past step 1. When I try to Export my library as Zotero RDF, I get a pop-up message that says "An error occurred while trying to export the selected file." It happens pretty quickly and a file is created in the specified location but there is nothing there. any ideas?
Try exporting again. If it doesn't work, submit a Report ID https://www.zotero.org/support/reporting_bugs
ReportID: 1989798613
How many collections do you have? Would it be feasible to export them one-by-one (top level collections that is).
Yes, I've got 29 collections. I'll try exporting a single collection.
Not sure that I am even going to be able to export by collections as they are now. I was successful on one of the small collections from ERIC (also hosted by ProQuest), but just got the same error one of the smaller PSYCInfo collections (1076 parent items, 21180 total items). Report ID for that error is 75680549- think it is also an out of memory error.
Maybe I can split the larger collections up and export that way. Is the "out of memory" error connected to the size of the whole library or just to the collection you are exporting? If it is still being affected by the gigantic library, maybe I can delete collections as I export them (I do have several backups of this library)?
Thanks for your continued help with this.
Also, keep in mind that if you "Remove collection" instead of "Delete collection and Items...", the items will end up at the root of your library. So if you want to be deleting collections as you go, you may want to export the "unfiled items" pseudo-collection first.
Just to give a quick update- after exporting around 80 RDF files (I had to split up my collections into 2 a few different files), and re-importing them as instructed above, the library is functioning so much better. Duplicates are taking about 5 seconds to merge now.
This might be something that you all should consider adding or making more clear for those of us unfortunate enough to be restricted to ProQuest for many of our main databases. The RIS files seemed like a good option for us because we could re-import them later if needed and it seemed quicker than using the folder icon at the time. The library was gradually getting slower as I imported these files, but I thought that was just typically of larger libraries. Obviously knowing what I know now, I should have inquired sooner to figure out what was going on. But it would be helpful to save others the pain of this in the future, especially those working with large libraries.
Thanks!
I had imported about a third of my original library (the first three years of my search) so that I could go ahead and work on the duplicates for those years. We are exporting to Filemaker for abstract screening, so my main goal was just to get the first couple years of my search into Filemaker first. That all worked fine.
Now that I am trying to import the rest of my RDF files, I am getting an error message on several of the RDF files- saying the file is not in a supported format. The first set I imported worked fine (and others in this new batch) so I'm not sure what happened. Should I send one of these files or is there something I should look for?
As for where to post information about this issue, I remember reading through the known translator issues for ProQuest and being concerned that if I tried to pull down a large number of files that it may stop working. This was part of the reason I choose to go with RIS. So I know that its not a translator issue, but you might include a note there to caution people from using RIS with ProQuest also.
It also might be helpful to have a general section about how to troubleshoot a slow library- things to check for, when to know when something is wrong rather than just a product of a large library. This might be a good place for the information about tags and how they can slow things down as well.
If you don't care about the file being public, you can paste its contents (open in a text editor to copy) to https://gist.github.com/ or you can put it in your dropbox (or the like) and share the link here
There is a syntax error in the abstract on l. 612 of the style, but removing that doesn't seem to help, either. I'm not quite sure what's going, but can reproduce the issue.