Duplicate Criteria

I could not find any info on how Zotero decides which files are duplicates. What is the criteria... exact titles and exact same author... does year have to be the same, ISBN, other combos that might trigger a match? Can the criteria be modified?
  • thanks for the link, but Im not good at reading the script/code, couldn't figure out the criteria from this ... what I could see was Title plus one creator?
  • The criteria shown they are:
    - Titles must match, then:

    - Different DOI or ISBN, not a match
    - Else, if there is one common creator and year is within 1 year, it’s a match
  • ISBN and DOI also work the other way: items with the same DOI or ISBN are always duplicates.

    FWIW, when trying to understand code you'd always want to look for comments (starting with // in javascript) which describe what's going on. The actual code is much harder to understand.
  • thanks so much, this was very helpful
  • So if articles have the same DOI but wildly different titles and authors (e.g. for a conference abstract issue) they show up as duplicates? At least that's what I'm seeing. Wildly different titles (not all conf issues) are showing up as duplicates.
  • That's correct, yes. DOIs are unique identifiers. Different articles should not have the same DOI. If they do, something is broken at that level.
  • Most of them are conference presentations, where the organisers have assigned a DOI to the compiled collection but not to individual presentations.
  • If the individual article doesn't have its own DOI, you should remove the DOI in Zotero (citing it would be incorrect and would potentially cause a mess with citation-based metrics).
  • There are hundreds of them. I'm afraid I don't have time to do that.
  • Up to you -- but the problem is still the metadata, not Zotero's behavior. There are several things that duplicate detection could do better and allowing to mark items as not duplicates is part of that, but the decision to mark items with the same DOI as duplicates by default makes a lot of sense.
  • I personally disagree with adamsmith that citing the DOI for the individual conference presentation items would be incorrect. That is the persistent identifier for locating the item. It's frankly stupid for publishers to do this sort of bundling of conference abstracts into a list (I also know some journals that do this with all of the commentaries to a target article). It will obviously potentially impact citation metrics, but that is the fault of the publisher not correct citation practices.
Sign In or Register to comment.