Short-term solution for 'merge-all' in duplicates?
I have a large set of duplicate records (over 15'000) in a few collections within a very large Zotero database. I understand some sort of new 'automated' functionality is in the works that should be able to de-duplicate all of those at once (e.g. by automatically merging identical items). But I want to run papermachines on some of these collections, and the duplicates will of course bias the output of papermachines towards those duplicate articles. Therefore - does anybody know of any quicker solution to solve this problem? Are there other bibliographical management programs I could export those collections to, deduplicate them there, and then re-import them into Zotero to run papermachines? Any help will be much appreciated.
1. select them (and probably move them into a temporary collection to make them easy to find in step 4 below).
2. export them in bibtex format.
3. use a high quality text editor such as jEdit to do search and replace to effect the correction.
4. delete the original items from Zotero.
5. import the corrected bibtex file to Z., to restore the items, with corrections.
This should work. However, I don't know what would happen with huge files. So, do a backup, then run a pile of tests! Also, Zotero doesn't export all fields, and some it does do not have strict equivalents in bibtext - e.g., "composer" and "audio file" become respectively "author" and "book". For me that's a serious problem.
None of the flat data formats are, unfortunately, loss-less. You can export to Zotero RDF which is practically loss-less, but also XML and easier to break by manually editing it.