Duplicate detection?

Zotero adds duplicates of existing items without complaint. This may make sense for some item types (eg. web page snapshots), but not for others (books, journal articles).
  • We're definitely going to have duplicate detection in a (near) future release. There are, however, some complexities to doing this right--e.g., if there are minor variations in the metadata for two books that are really one book, which one is canonical? Anyway, we'll work this out and have some options for handling duplicates well (or deleting them).
  • I was just getting ready to post on this topic, but thought I better read through the forum first. And sure enough, here it is.

    I hope this is coming in the VERY near future. I understand there are complexities involved, and in some cases one might WANT an apparent duplicate (track web page changes for example).

    I do a lot of searching using different search terms, and the same item often appears multiple times, but my brain memory isn't able to remember everything that I've already entered (If it could I wouldn't need Zotero! :))

    I'm hoping for at least a "DUP ALERT!- continue?" notice.

    Soon? Please?
  • Hi,
    a suggestion for the duplicate detection in the making:
    in addition to the standard duplicate detection features, it would be great to have the duplicate detection integrated with the reference detection on the web in such a way that the folder/icon in the address bar indicates which detected references are possible duplicates. A feature to view the corresponding items in the database without importing the new ones would then be very helpful (e.g., an extra little icon beside each detected reference, or whatever...). Ever nicer (although I'm not sure whether technically doable) would be to have Zotero mark suspected duplicates within the browser window itself, e.g., mark Pubmed entries corresponding to references already in the DB with a different background color.
  • One problem is that some sites modify the pdf file you download so that it mentions when it was downloaded. This is rather annoying as two copies of the same pdf file will have different checksums.
  • I'd like to second yeti's suggestion for visual notification of duplicates in the address bar
  • I agree that duplicate handling is important. Thanks for the great program.
  • Any news on this essential feature?
  • Greetings,
    I would like to know if anyone is working on this feature because I am willing to put some time to do it.

    Looking forward for more information.
  • I'd also like to add a request for this feature
  • Yes, the ability to recognize duplicates and merge entries is crucial!
  • Even a simple duplicate entry detection would be valuable. Suppose one goes to google scholar and fetches interesting items, then goes back a month later to fetch more. Unless one is careful to examince each new entry by hand, exact duplicates will enter the database. I haven't done this yet, but I expect to be asked to update my bibliography of 700 items, and this looms as a distressing prospect.
  • Yes, this is a crucial feature for maintaining an article database. I am primarily collecting PubMed entries and it would be great to be able to do a duplicate search based on user defined criteria (such as PMID) that would uniquely identify each article. The suspected duplicates would be listed based on this criteria and the user could keep/discard entries on the short list. It would also great to be alerted when importing a suspected duplicate based on the same user defined-criteria.
    This would also avoid the problem of collecting duplicate entries due to corrections/changes to the abstract or checksum issues with downloaded pdfs.
  • Hi,
    I'd like to add my vote to this request! This is the main reason why I would still consider using RefMan instead of Zotero.
  • Basic duplicate detection has recently been added to the trunk, and we will continue to refine its algorithm and user interface over the next few weeks.
  • I was going to my open-source citizen duty and report on that shortcoming, but you guys are on the issue already. Awesome! Good job!
  • I have version 1.5b2.r4280 but cannot find this feature.
    Where is it?
  • Does the feature now apparently in the trunk allow to merge duplicate entries?
    Not yet—this is part of the reason it's not yet enabled.
  • edited March 31, 2009
    It would be fine computing between two items a similarity measure. If it is 100% - there is a duplicatum. If close to 100% - the items have some differencies, maybe there are some spelling errrors. If only 10-20% - something identical (autor, title), but most of the properties not.
  • HI,
    I use Zotero 1.0.9 the current stable version and I will like to search through references in my library to detect duplicate references. Does anyone know if this feature is available in Zotero and how I can use it. Thanks.
  • As stated above in this thread basic duplicate detection is now included in the most current development versions. It will be part of the next beta release, but will not become part of old versions of the software.
  • Hi. Can I ask for *restoration* of the duplicate item function absent from the latest beta release? I often find myself citing a 'book section' from an existing book, and the quickest, most elegant way I've found to do this is to duplicate the 'master' book entry then change it's type to book section, so cloning all the book details across. Any views on this?
  • svetlovska: That's not what this thread is about. The feature you're referring to will return in a future version. It's only not working for items with tags.
  • Is duplicate detection part of 1.5b2.1? I am not sure what features were included in this update, thanks!
  • Oh okay, thanks!
  • is this feature working in 1.5b2.1? i.e. no dublicate links

    when i "save link to current page" for a particular web page Z. keeps adding newslinks to the page. is this a bug? hard to imagine this is a feature, or?
  • is this feature working in 1.5b2.1? i.e. no dublicate links
    No, per Dan's comment, just two posts above: 1.5b2.1 contains no changes from 1.5b2.
  • So, what about the 2.0 release? Does it have the dup detection functionality?
    Yes, but it needs to be enabled by a hidden about:config setting. The algorithm & features surrounding it will likely be improved before it is enabled by default.
