barrier to entry: no duplicate detection

I'm migrating a group of 10-15 people onto zotero and, in the process, importing a pretty redundant old system of files/folders containing PDFs. as we bring in old folders and tag them, we're discovering lots of duplicates, and one of the great things about zotero is the non-exclusivity of tags and collections, allowing one paper to live in multiple collections. the problem is that as the library grows, manually pruning duplicates becomes impossible, and more so as individual users bring in their personal collections. even the rudimentary ability to find duplicates for manual merge/deletion would be a huge help. my understanding from the 4 year old thread linked below is that such a function might be a hidden preference but i cant find it. any advice on how to activate this feature, or any info on when it might become available would be great. this is obviously a crucial feature for many.

http://forums.zotero.org/discussion/42/2/duplicate-detection/
  • edited July 28, 2010
    It's very well hidden; you actually need to create the option itself in Firefox. Open the url: "about:config", right-click in the listing, select New -> Boolean, and create an option entry like this:

    extensions.zotero.debugShowDuplicates

    Set it to "true", and you shouid find the Show Duplicates option in the gear menu. It's said to be slow to run, but it should narrow down the listing to mostly duplicate items.
  • wow, i thought going into about:config was pretty wonky. i didnt know there were user created operators in about:config. cool.

    I ran it on my duplicate-free library of 826 items and it came up with 26 false positives, for a specificity of 97%, which is perfectly acceptable to then allow me to manually prune the duplicates. Of the 26 false positives, the heuristic, if that's the word, behind each was clear--patents with nearly identical titles, papers with the same first author or nearly identical titles, etc.

    as rudimentary as it is, it's still a huge help to us in our importing of libraries.

    many many thanks for this.
  • edited July 28, 2010
    The code was checked in by Dan Stillman, based on a submission from another contributor (first name Ben). I was only the messenger -- very glad to hear it's working for you though!
  • Thanks fbennet. I've been wanting this feature forever and couldn't follow it from the previous threads. This is great!
  • The "show duplicates" option appears in the gear menu after doing the about:config procedure, but nothing happens then when i click on it. Can any one tell me how it works. When I click the action button remains pushed a while then nothing else, and all duplicates are stille there (after sync i got three or four times each item). Thanks!
  • hi--I have a problem similar to juanalp. The "show duplicates" option only revealed that I had three duplicates in a list of several thousand items. In reality, there are hundreds of duplicates. I had synced my zotero across a couple of computers and it duplicated every entry many many times over. I know have about 10 of everything. The fact that the "show duplicates" option does not even reveal them is almost incomprehensible ...most are in fact completely IDENTICAL, including the date of creation down to the second.

    I'm really at my wits end on this one--I've patiently waited for about two years now for Zotero to implement some sort of duplicate detection, and hopefully deletion, and still nothing functional has been released. I can't even use my library anymore.

    Sure I could spend 6 hours manually deleted everything, and I've tried that. Then I discovered that not every entry could find the snapshot attached to it, so really I'd need to go through each entry and pull up the snapshot to make sure the link hasn't been broken.

    Please do something about this duplicates problem!
  • edited October 2, 2010
    The fact that the "show duplicates" option does not even reveal them is almost incomprehensible
    It's a nonexistent setting with "debug" in the name that we've never suggested using. Is it really that incomprehensible? It's not that nothing "functional" has been released—duplicate detection support has not been released.

    But we're aware of the demand for this feature.
    I had synced my zotero across a couple of computers and it duplicated every entry many many times over.
    Doesn't really help you now, but for what it's worth, this only happens if you sync separate upgraded-from-1.0 libraries. Zotero sync never creates duplicates by itself—it can't, in fact.
  • Hi Dan,

    I really appreciate you responding to my comment. I know you guys have put a lot of work into this and are very busy.

    I guess my point is, it's been three years since people have been asking for duplicate detection, and so far there's still no solution. I've read the forums--and I understand you don't want to break links, etc, but frankly without this feature Zotero is severely crippled.
  • Sorry for ignorant question, but where is the 'gears' menu?

    I created the option, but I cannot find the menu..
  • the symbol to the left of the green plus sign is a gear. You might not see it if you have collapsed the left (collection) panel.
  • This duplicate detection is really a big improvement. Thanks a lot! Are there plans to make it 'tunable' ? I noticed it detects journal articles with similar names as possible duplicates, where that would be only the case if titles are identical. I can imagine that this is not the case for all users (I'm importing 99% via pubmed).

    However, on top of my wish-list is to have the duplicate check performed at the time of import. I noticed that I can still import refs that are already in the database. Hopefully that will be included in the released version.

    Again, thanks for sharing the hidden feature! Even in this not-yet-perfect state it's helping me a lot.
  • I just saw this and tried it, but nothing shows up under the gear menu.
    Using FF 3.6.13, Mac Snow Leopard.
    Tried restarting, etc.
Sign In or Register to comment.