Duplicate detection?

2456
  • Actually, while I wasn't looking, zotero 2.0 seems to have identified and jettisoned duplicates... the choice seems to have been a bit arbitrary, and I definitely lost the "better" version of a reference in at least one case that I've tracked down.

    Very strange behavior... and potentially dangerous!

    How do I turn this off?

    I don't see anything that seems remotely close searching for zotero in about:config
  • Actually, while I wasn't looking, zotero 2.0 seems to have identified and jettisoned duplicates...
    There is no such functionality in Zotero 2.0...
  • OK - I'm not sure that this actually happened. I did a search once for an author and only one copy of a paper came up (which had duplicates). Tried again just now, and the duplicate is still there...

    Still a little troubling.
  • Actually, while I wasn't looking, zotero 2.0 seems to have identified and jettisoned duplicates
    How did you arrive at that conclusion? I don't think there is any client side duplication detection or removal enabled (I could be wrong--I'm not a Zotero dev). If anything, I'd expect this to happen through a sync (where you may have updated a record on one machine & those updates didn't make it to the server).
  • "it needs to be enabled by a hidden about:config setting"

    Exactly how?
  • Yes, please, how????
  • Is this new feature just going to warn about adding new duplicates, or is there any hope of being able to prune old libraries?
  • It will do the latter before it does the former. Warning on save is rather complicated in the context of translator-based saving. (When would it warn? What kind of user intervention would be required? What if you saved multiple items at once and only some were (potential) duplicates?)
  • Thanks Dan. The latter's more interesting to me as it happens anyway. I have lots of dupes due to earlyish zotero testing, multiple imports etc. Look forward to it arriving.
  • Another field that deals with managing duplicate meta data is the contacts management field. You can get a good example of a clean user interface for finding and then managing duplicates with the currently free to download Spanning Tools here:

    http://blog.spanningsync.com/

    thanks for working on the dup-detector!
  • I would like to manually merge two items. Does this feature exist? This isn't as complicated as automatic detection of duplicates, but it would be useful. I think that the 'conflict resolution' logic from sync would be perfectly suited for this. I picture doing a multiple-select in the middle pane then right-clicking, and using a 'merge items' on the context menu.
  • I'm really keen to find out how to access the 'hidden' setting to detect duplicates. Importing my current pdf library shows me how many duplicates I've created over many years, and sorting it all manually will be very time-consuming. If Zotero is likely to be able to do it for me pretty soon (eta??) then I'll wait. I'm running Zotero 2.0b6.3 (using it with NeoOffice). A 'merge duplicates' capability - where Zotero prompts to merge items that it flags as duplicates - would be perfect. So I'm very strongly agreeing with all the comments above.
  • ditto; merging duplicates would be great -- Ideally, autorecognize if -enough- is the same, and then let you select which of the fields that are different are correct. Notes would ideally be aggregated, unless exact duplicates.

    Here's another thought -- duplicates between group library and personal library. What if we want to bring notes and changes made to a group library entry back into our own personal library? Is there a way to do that sort of merge?
  • I'd like to add my support for this function! After importing various libraries from other sources, I have so many duplicates it's very hard to go through and clean the files.
  • coming soon... it's already in Zotero, apparently just not activated because it's not working properly...
  • Is it possible to find out which collection your items are in?

    e.g. if you do a search for "Obama" in the overall "My Library", and three duplicate papers by Obama come up, can I find out which Collection they are hiding in?
    (before I go ahead and delete two of them!)

    I'd rather not search each of 30+ collections one by one...
  • Hey - sorry for not RTFM - I just noticed Tips and Tricks no 3.
    (Hold down Ctrl.)
  • Just started working with Zotero today. Love it! And, yes, the missing duplicate remover is the only flaw that stops me from switching seamlessly.
  • Any news about the detect / warn for duplicates topic? A really sought after feature!
  • Even if the complete duplication detection functionality is not ready yet, would it be possible to enable detection of exact duplicates?? This doesn't seem difficult to implement, and if you use one resource for all reference information (e.g. pubmed) all duplicates will be exact duplicates.
  • as Dan has explained elsewhere the problem isn't the detection, but what to do with duplicates - e.g. how to deal with items that have been cited in documents in the past. I understand the milestone for this is 2.1
  • Any detection at all would be helpful. Just listing duplicates so I can decide what to do with them would be a start. I've just started using Zotero, so dealing with duplicates that have already been cited is not a big problem. (A possible solution would be to replace duplicates with a "redirect", but I guess that'd require quite a bit of recoding of Zotero.)
  • I am trying to figure out what is currently implemented. There have been multiple posts that say this is in the beta (I am using 2.0b7.6) but no post about which about:config preference controls this. Why won't anyone tell us how to enable it? We are beta testers, after all.

    Personally, I'd just like one to show all the duplicates so I can choose to delete the newest one. That way, when I add citations I can check for any duplicates and immediately remove them. This is a show-stopper, in my opinion. How is it possible to remember whether I've already added a citation? The whole purpose of reference management software is to make it easier to reference material. This makes it more difficult that doing it manually.

    If I can see a barely functional utility for removing exact duplicates, with occasional progress, I will continue with Zotero. Otherwise I'll have to look for another solution. Perhaps using Perl with curl or wget from PubMed and the OpenOffice bibliography.
  • Hi,

    I just want to add my vote for this. I'm using Zotero now for over a year and it get's more and more important to get at least a simple duplicate detection!
    A great help would be an detection during the import, so before the problems with already cited duplicates would arose. I guess it can't be to hard to implent something like "Warning possible duplicate. Do you want continue on importing?" and it would be a GREAT help at least to us life science people (dealing with pubmed).

    You do an awfull good job wit Zotero and I guess so many people would use it without this little flaw..!

    Best and thanks,
    Jan
  • I agree--I do not use Zotero because of this. It is the only reason I haven't used it--without this it is not useful for me--I can't bother to take the time to weed out duplicates manually. This is supposed to save me time!

    bpeyser
  • just as a general note - 2.0 is feature frozen. So if this is going to happen it's going to be in 2.1 - I think that's definitely the sense. I agree with gebauer - if this turns out to be too hard to detect and delete, some type of check for duplicates while adding would make a lot of sense.
  • Yes, a way to detect duplicates is badly needed. I just looked thru my library and I have 2,3 or 4 of the same item.
  • I think a way to merge items would be the more significant feature, and possibly simpler to accomplish than duplicate detection which could be more of a nuisance than a help if it is not really well implemented. I can find dups fairly easily, but then I have to figure out which one I have cited in documents, which one has more/better-formatted data, manually move any attachments that need it, etc.

    Another thing that could really help for "pruning" a database would be a perfect export that could be re-imported without any loss (why not CSV?).
  • For what it's worth, I put up some thoughts about this on zotero-dev.

    In legal research, we trawl over the same online material repeatedly, and the accidental accumulation of duplicates is inevitable, and likely to be annoying. As this affects other researchers as well, I would vote for making this one a development priority in ver 2.1.
  • Even detection of only exact duplicates would be a vast improvement. I suspect many users, like me, primarily use one database for any given research area. In this case, exact duplicates are actually a majority of the duplicates that arise.
This discussion has been closed.