Short-term solution for 'merge-all' in duplicates?

sdspieg · December 2, 2012

I have a large set of duplicate records (over 15'000) in a few collections within a very large Zotero database. I understand some sort of new 'automated' functionality is in the works that should be able to de-duplicate all of those at once (e.g. by automatically merging identical items). But I want to run papermachines on some of these collections, and the duplicates will of course bias the output of papermachines towards those duplicate articles. Therefore - does anybody know of any quicker solution to solve this problem? Are there other bibliographical management programs I could export those collections to, deduplicate them there, and then re-import them into Zotero to run papermachines? Any help will be much appreciated.

tomcloyd · December 13, 2012

In my experiments with input of large numbers of items, using bibtex markup and a spreadsheet (which I hope soon to be documenting the Zotero documentation wiki section), it has occurred to me that a quick way to mass-correct certain problems in Zotero entries would be to:
1. select them (and probably move them into a temporary collection to make them easy to find in step 4 below).
2. export them in bibtex format.
3. use a high quality text editor such as jEdit to do search and replace to effect the correction.
4. delete the original items from Zotero.
5. import the corrected bibtex file to Z., to restore the items, with corrections.

This should work. However, I don't know what would happen with huge files. So, do a backup, then run a pile of tests! Also, Zotero doesn't export all fields, and some it does do not have strict equivalents in bibtext - e.g., "composer" and "audio file" become respectively "author" and "book". For me that's a serious problem.

adamsmith · December 13, 2012

Also, this method will wreak havoc on syncing and existing word documents, so I wouldn't recommend it unless you're at the very beginning of using Zotero.
None of the flat data formats are, unfortunately, loss-less. You can export to Zotero RDF which is practically loss-less, but also XML and easier to break by manually editing it.