DOI and duplicate
Hya
It is always nice to use such an excellent software.
The document reads "Zotero currently uses the the title, DOI, and ISBN fields to determine duplicates." under duplicate detection.
The rule for DOI reads, "Each DOI® name is a unique "number", assigned to identify only one entity. " http://www.doi.org/doi_handbook/2_Numbering.html#2.1
But, some publisher DO NOT FOLLOW the rule. Many entries share same DOI, 10.1056/NEJMc1804294. I also have seen entire abstracts of a meeting have one DOI.
I guess some sort of tweak may be necessary for duplicate detection.
cheers
It is always nice to use such an excellent software.
The document reads "Zotero currently uses the the title, DOI, and ISBN fields to determine duplicates." under duplicate detection.
The rule for DOI reads, "Each DOI® name is a unique "number", assigned to identify only one entity. " http://www.doi.org/doi_handbook/2_Numbering.html#2.1
But, some publisher DO NOT FOLLOW the rule. Many entries share same DOI, 10.1056/NEJMc1804294. I also have seen entire abstracts of a meeting have one DOI.
I guess some sort of tweak may be necessary for duplicate detection.
cheers
Where are you seeing many entries sharing that DOI?
In a technical sense, neither of these are “many entries sharing same DOI”. They are single publications, with several component parts. This is an edge case enough that I don’t think it is necessary change anything about the duplicate detection algorithm other than providing an option to manually mark items as non-duplicates.
1. One-to-Many: Yes, sometimes multiple items are included under the same general DOI (as bwiernik explained).
2. Many-to-One: Different websites give the same article different DOIs that point to their servers. This applies to journals available through multiple databases (and possibly directly at the publisher).
3. New DOIs: I've often found newly published articles, especially ahead-of-print, have DOIs that don't work. This is usually resolved quickly (weeks or months?) but given the (justifiable) bias toward citing the newest, state-of-the-art research in many fields, these papers will be cited more often than most. (These probably are unique most of the time, but I wouldn't be surprised if some are duplicates or errors.)
4. Duplicates (or just errors): some publishers (especially less prestigious/sophisticated ones) just mistakenly assign the same DOI multiple times and might (or might not) fix it later.
Ideally, DOIs should be unique identifiers, but that's not always the case in practice.
(Ideally none of these, with the possible exception of 2), should occur, obviously, but that's not our concern here).
As @bwiernik said, having an option to manually remove duplicates would be fine. As a heuristic, DOIs are fine for identifying possible duplicates, but it's frustrating that there's no way to clear out the duplicates list after checking manually. I know this feature is complicated and in development, though.
A year ago, the Transportation Research Record (TRR) journal of the Transportation Research Board (TRB) moved to Sage publications.
http://www.trb.org/Research/Blurbs/177011.aspx
And guess what, Sage has assigned different DOIs to all historical articles. I happened to add a few pre-2017 articles yesterday and they seemed familiar so I was wondering that I possibly already have them in my database. But they would not show up in my duplicates list because they had different DOIs.
Can the user have more control on how to detect duplicates? Or, AT LEAST be able to mark a group of articles as duplicates (thus, different than marking as non-duplicates and maybe easier?). Or, allow a "maybe duplicates" collection based on somewhat less stringent criteria such as identical titles?
Ideally, I wish DOIs were strictly unique. I will write to TRR and Sage but doubt that's going to help.
Regarding TRR and Sage, my guess is that Sage is now using their own prefix. One problematic aspect of DOIs is that only the first part of the DOI is centrally assigned, and that just then points to a collection of DOIs defined by the second part at the host. That's why most DOIs follow a mostly numerical format but some publishers/journals decide to have mostly alphabetic DOIs, again just for that second part. In the end, it's probably better if Sage uses their own DOIs because they're now managing the material, but of course this just reveals the inherent weakness in DOIs: It's still up to publishers/distributors to manage their content and keep it online. If Sage or another publisher shuts down and disappears from the internet, the DOI is useless. Or if TRR changes distributors again, they might reset all the DOIs again as well. Unlikely, and still helpful in most cases for locating articles, but yet another example of why DOIs are better in theory than practice. Certainly we would never want to replace full citations with DOIs only!
Just in case this is helpful to anyone, here's an example of the same article with different DOIs:
https://doi.org/10.3141/1987-09
https://doi.org/10.1177/0361198106198700109
Interestingly, when I click the DOI of another article from 2013, it is redirecting to the SAGE page that has the same DOI.
http://dx.doi.org/10.3141/2340-02
So, SAGE is not assigning new DOIs to all articles but redirecting some of them? Confusing.
https;//doi.org/10.1172/JCI18937
https;//doi.org/10.1172/JCI200318937
These two DOI's point same article. Within a publisher, two DOI's.
Now, I've learned duplicate-detection is not an easy task.
cheers