Identifying duplicates incorrectly

Hi, I'm working on a project and there are "duplicate items" Zotero is flagging that are clearly not duplicates. For example, I have a couple items where they detect 53 potential duplicates (but the citations are very different). I didn't have this problem in the past - is it a glitch?
  • Can you take a few screenshots that show what you're seeing, including some of the item details, upload them somewhere (e.g., Dropbox), and provide a link here?
  • https://uofh-my.sharepoint.com/:i:/g/personal/bdasias_cougarnet_uh_edu/EVNBQFCm-PdFt-w-2wI2VpUBQUV5Pwdni0rTOsjqilGpmQ?e=t17dar

    For example, when I click on "a pooled analysis of efficacy and safety of ertugliflozin as add on therapy to metformin" (see screenshot), to the right it says there are 53 items we could merge, even though there's really only 2 with the same title.
  • Do those 53 items all have the same DOI?
  • They do! I didn't notice that.
  • That’s the cause. DOI is the first step in the duplicate detection.
  • and what is the second? Any way we can temporarily avoid duplicate detection by DOI and use other parameters?
  • Zotero first matches on unique identifiers -- items with the same DOI are marked as duplicates, as are books with the same ISBN. After that, it uses a fuzzy matching using author, title, and date, I believe.

    No way to change this without modifying the source code and recompiling, sorry.
  • edited December 12, 2020
    Hi, everybody.
    Hi developers.

    I have the similar problem with duplicated items.
    I have a huge number of duplicated items in the folder. However, only 3-5 of them can be considered as duplicates. All other differ from each other.
    Many of them are in only 1 instance and there is not another record which can be selected.
    Other are marked as duplicates but they definitely different. You can see the screenshot of the last example - this two records have only identical type of the paper. And all other information is different completely.
    http://www.triacon.org/mycloud/dups.jpg
    I knowingly put 2 screenshots close to each other but select different versions as master.
    You also can look at the list of the duplicated items sorted by Title. There only couple of similar (not identical) titles while all other differ from each other.
    I think that the best and simple solution at the moment could be to add the possibility to unmark items from duplicates list.

    *************
    Updated: in 2 hours

    Hi, dear developers!
    @adamsmith @dstillman

    It seems I have found the problem
    The selected items have IDENTICAL itemID!

    http://www.triacon.org/mycloud/itemid3604.jpg

    As you can see this value 3604 assigned to 4 items in the database.
    The last 2 are close to each other. But the first 2 are differ.
    In the duplicates list only 2 of them indicated the first in the list in picture and the last.
    I tried to search the second record (ISO 13.020) in Zotero and you can see it was found, but refers to the last record. I think this is because the record of ISO 13.020 was deleted.

    So now is your turn.

    Regards
    Andrey
  • As I say above:
    Zotero first matches on unique identifiers -- items with the same DOI are marked as duplicates, as are books with the same ISBN. After that, it uses a fuzzy matching using author, title, and date, I believe.

    No way to change this without modifying the source code and recompiling, sorry.
  • My "duplicate items" is flagging a book and journal article as duplicates because they have the same title and one overlapping author (not identical author lists though!) . The article has a DOI but no ISBN and the book has a ISBN but no DOI.

    This seems like an easy improvement: certain item types cannot be duplicates like books and journal articles.
  • I'm having the same problem. Items marked as duplicates based on ISSN or DOI number, but are not actually duplicates.

    Can you elaborate what you mean by "no way to change this without modifying the source code and recompiling"? How would this work?
Sign In or Register to comment.