@ckemere wrote: The problem I find is that Mendeley/PubMed often classified conference papers as journal articles for some reason, so I still have to do some manual clicking to normalize these entries.
I urge you to be careful. PubMed lists literally hundreds of journals that are "Proceedings of ..." These are treated by the professional association as a serial not as a conference proceedings book. I could give specific examples if requested.
Many of the PubMed proceedings are in serial publications that have an ISSN and not in proceedings books with an ISBN.
Often these proceedings appear in supplement issues of unaffiliated (or loosely affiliated) journals.
The auto-lookup result for IEEE conferences appears to be "conference paper" (PubMed lists as "journal"). NIPS conferences, for some reason, end up as "book". In my case, I don't really think it matters, as long as the right DOI/PMID are associated and my library is consistent. In my field, I've never used the ISBN or ISSN to actually find a document/book.
Maybe there is a possibility to write a plugin for this? You could tell the plugin if you'd like to prefer the older or the newer version (=master) and which fields have to be 100% identical and which fields should be taken from master only, then proceed merging all duplicate articles that are 100% identical in the fields that you specified and skipping all others for manual merging. Clicking more than 10 times on the merge button without looking at what you are merging makes zero sense. I may give it a go in November if there is demand for this plugin, or maybe someone comes up with something even better before then.
I must say the current tool to duplicate items that auto-selects each bundle is very nicely done. But it would be really great to be able to automatically merge items whose only difference is the "Accessed" field. I'm not a programmer, but this sort of sounds like something that could be done with javascript...
I also have thousands of duplicates. An automatic merging is a MUST to implement. I get that there might be some mistakes made, but to be honest when you have to merge 1000s of papers anyway, you don't review them one by one, you just click click click....
I have duplicates from years apart syncs on zotero, hundreds, this is more than "15 minutes work" (I just did about 15 minutes before coming here... and they are 90% of them identical except for a timestamp (one from 2004 (original endnote import) and one from 2014).
I do not care about the "edge-cases" I care about my time, approaching 60yo there is less of it. Surely computers are for automatiing the menial?
Other dealbreakers in the literature are Nextcloud's lack and refusal of a one way sync, (no doubt a baked-in design "feature" from not expecting people to have decades of work on a PC anyways surely everyone is one the cloud, (sorta the opposite of the problem here) and Microsoft Sharepoints refusal to allow/expect web 1.0 site mapping style structures easily without paying $10K a year to some app, totally flat horizontal sharing, no curating allowed, too bad if you want to allow peeps to browse and discover stuff one does not know the name of to search). And do not get me started on Adobe.
Zotero currently uses the title, DOI, and ISBN fields to determine duplicates. If these fields match (or are absent), Zotero also compares the years of publication (if they are within a year of each other) and author/creator lists (if at least one author last name plus first initial matches) to determine duplicates. The algorithm will be improved in the future to incorporate other fields.
So the removal of identical files that you mention is relevant only if the metadata of the items is good enough to be identified as duplicates (or if you have another way to identify them as duplicates). At the moment, I do not know any tool checking for duplicate attachments by checksum.
I think Zotero keeps the file attached to the master item selected during merging.
Thank you, the reason I was asking about the merge is because I noticed in the past that some items had identical attachments after merging. That was a while ago, great that this has been improved.
okay I've dealth with them, some other suggestion are: —that the duplicates folder have a subcollection for acknowledged false positive candidates, —and another for recent possible duplicates, th ehistorically dealt with ones where a bit awkward.
Also some references to last century newspapers on trove.nla.gov.au are all titled 'Advertising' (I had about 100) came up as duplicate candidates so an —ignore some field with e.g. "trove.nla.gov.au' might be useful
Alas, for many journals when there are letters commenting on an article, the letters can have the same DOI. Another problem is that the same identical article can have different DOIs.
I urge you to be careful. PubMed lists literally hundreds of journals that are "Proceedings of ..." These are treated by the professional association as a serial not as a conference proceedings book. I could give specific examples if requested.
Many of the PubMed proceedings are in serial publications that have an ISSN and not in proceedings books with an ISBN.
Often these proceedings appear in supplement issues of unaffiliated (or loosely affiliated) journals.
You could tell the plugin if you'd like to prefer the older or the newer version (=master) and which fields have to be 100% identical and which fields should be taken from master only, then proceed merging all duplicate articles that are 100% identical in the fields that you specified and skipping all others for manual merging. Clicking more than 10 times on the merge button without looking at what you are merging makes zero sense. I may give it a go in November if there is demand for this plugin, or maybe someone comes up with something even better before then.
I prefer to lost records than to merge everything manually. I'd appreciate having this option.
I get that there might be some mistakes made, but to be honest when you have to merge 1000s of papers anyway, you don't review them one by one, you just click click click....
https://github.com/frangoud/ZoteroDuplicatesMerger
I do not care about the "edge-cases" I care about my time, approaching 60yo there is less of it. Surely computers are for automatiing the menial?
Other dealbreakers in the literature are Nextcloud's lack and refusal of a one way sync, (no doubt a baked-in design "feature" from not expecting people to have decades of work on a PC anyways surely everyone is one the cloud, (sorta the opposite of the problem here) and Microsoft Sharepoints refusal to allow/expect web 1.0 site mapping style structures easily without paying $10K a year to some app, totally flat horizontal sharing, no curating allowed, too bad if you want to allow peeps to browse and discover stuff one does not know the name of to search). And do not get me started on Adobe.
instructions refer to the file .xpi
can't find it listed, don't know what to download to have it included, is github always this obtuse?
Is there any tool that checks for duplicate attachments by checksum?
And also: https://www.zotero.org/support/duplicate_detection#finding_duplicates
So the removal of identical files that you mention is relevant only if the metadata of the items is good enough to be identified as duplicates (or if you have another way to identify them as duplicates).
At the moment, I do not know any tool checking for duplicate attachments by checksum.
I think Zotero keeps the file attached to the master item selected during merging.
—that the duplicates folder have a subcollection for acknowledged false positive candidates,
—and another for recent possible duplicates, th ehistorically dealt with ones where a bit awkward.
Also some references to last century newspapers on trove.nla.gov.au are all titled 'Advertising' (I had about 100) came up as duplicate candidates so an
—ignore some field with e.g. "trove.nla.gov.au' might be useful
Any time an article with the same DOI/ ISSN is uploaded, it is auto merged into the existing record?
(Could it also be done to have the option to automatically update fields based on an external database like crossref?)