false duplicates in the "duplicate items", clearly there are not...

I have these items, which are clearly not the same; however the Zotero "duplicate items" marking as duplicate:
http://hdl.handle.net/1721.1/6337
http://dx.doi.org/10.1016/0004-3702(81)90024-2

The same happens with the following items:
http://hdl.handle.net/1721.1/6030
https://doi.org/10.1007/BF00056771

and also with two items:
http://hdl.handle.net/1721.1/6419
http://dx.doi.org/10.1016/0734-189X(86)90114-3

What's happen? Does Zotero have something against the author? Haha
  • Only looked at the first one, but same authors, same title, almost same publication year seems like a reasonable heuristic for identifying duplicates, so not surprised it's putting those into duplicates. An option to mark items as non-duplicate is planned.
  • edited June 22, 2023
    Almost ≠ same year of publication.

    In each case the first of the pair is a pre-print, report, or conference paper while the second item in each case is a journal article. Volume, issue and pagination field metadata are different for each pair. The second item of each pair has a journal name while the first has no journal name when brought into Zotero. @riegarda If you have the first item in each pair labeled in Zotero with the Journal Article type, you have contributed to the problem.

    The third example: in the first case the item has one author and the second item has two authors.

    These are clearly similar but not duplicates. In my work I will sometimes cite both pre-print version and the journal to compare and contrast them. Merging in this case would mask the reality that the items are discrete although similar.

    I hope that the capacity to mark items as non-duplicates can be implemented soon. I was anticipating that this would possibly be a feature of Zotero 7. (Please make it so as the program moves out of beta.)

    TMI:
    I don't fully understand the Zotero database structure and I've intentionally avoided looking at it lest I be tempted to do something I'll regret.

    That said in my own online MySQL database I have a field in each relevant table that holds the record numbers for items that are similar (potential duplicates) but which I've determined are not duplicates. The "not duplicate" determination has to be done by human action but after that the records no longer appear in the potential-duplicates list. (The test-for-potential-duplicates utility looks at the record numbers in the not-duplicates and ignores the records in that list.) However, if a new record is added that is similar it is to the old records, it is flagged as a potential match for both old records because the new record number does not match the not-duplicates record numbers of either of the older records.
  • > Almost ≠ same year of publication.

    Right, but it's still a reasonable heuristic (!) because online early and 'final' publication (which are.true duplicates) are also frequently a year apart. But yeah, Zotero's duplicate detection isn't very feature rich (other tools like Endnote and Refworks also let you customize the detection, e.g.)
  • These are the reports of the two first items

    Determining optical flow
    Item Type Journal Article
    Author Berthold K. P. Horn
    Author Brian G. Schunck
    Date 1981-08
    URL http://dx.doi.org/10.1016/0004-3702(81)90024-2
    Volume 17
    Pages 185-203
    Publication Artificial Intelligence
    DOI 10.1016/0004-3702(81)90024-2
    Issue 1-3
    ISSN 00043702
    Date Added 1/19/2015, 10:54:50 AM
    Modified 10/24/2022, 3:39:22 PM
    Attachments
    Horn and Schunck - 1981 - Determining optical flow.pdf

    Determining optical flow
    Item Type Report
    Author Berthold K. P. Horn
    Author Brian G. Schunck
    Date 1980
    URL http://hdl.handle.net/1721.1/6337
    Accessed 1/1/2013, 6:00:00 PM
    Extra Code: AIM-572
    Pages 28
    Institution Artificial Intelligence Lab, MIT
    Date Added 1/19/2015, 10:54:51 AM
    Modified 10/24/2022, 3:39:19 PM
    Attachments
    Horn and Schunck - 1980 - Determining optical flow.pdf
  • edited May 6, 2024
    Okay. Zotero's duplicate detection utility is by design quite sensitive. For many reasons that is a very good philosophy. You really want to identify all potential duplicates and make decisions about how they should be handled.

    Almost identical records might be merged and then have the most complete and up-to-date metadata. It helps you to avoid citing two different versions of the same publication.

    In your case (as well as with my own non-Zotero database) those would and should be flagged as potential duplicates. You may want to elect to only keep the record that is the (later) journal article. If the two records are a pre-print and an identical published article in the same journal, you will probably want only the published article. In my own case, I want both the journal version and the report version because I am likely to cite both when comparing and contradicting contrasting the two versions.

    That Zotero identifies potential duplicates well is a really good thing. What you and I want is a way to reciprocally mark records as not-a-duplicate-with other(s).
  • Being able to mark two items as not being duplicates of each other is a feature that I would find very useful as well. I was wondering whether there was any update on the implementation of this feature.
Sign In or Register to comment.