Actually, while I wasn't looking, zotero 2.0 seems to have identified and jettisoned duplicates... the choice seems to have been a bit arbitrary, and I definitely lost the "better" version of a reference in at least one case that I've tracked down.
Very strange behavior... and potentially dangerous!
How do I turn this off?
I don't see anything that seems remotely close searching for zotero in about:config
OK - I'm not sure that this actually happened. I did a search once for an author and only one copy of a paper came up (which had duplicates). Tried again just now, and the duplicate is still there...
Actually, while I wasn't looking, zotero 2.0 seems to have identified and jettisoned duplicates
How did you arrive at that conclusion? I don't think there is any client side duplication detection or removal enabled (I could be wrong--I'm not a Zotero dev). If anything, I'd expect this to happen through a sync (where you may have updated a record on one machine & those updates didn't make it to the server).
It will do the latter before it does the former. Warning on save is rather complicated in the context of translator-based saving. (When would it warn? What kind of user intervention would be required? What if you saved multiple items at once and only some were (potential) duplicates?)
Thanks Dan. The latter's more interesting to me as it happens anyway. I have lots of dupes due to earlyish zotero testing, multiple imports etc. Look forward to it arriving.
Another field that deals with managing duplicate meta data is the contacts management field. You can get a good example of a clean user interface for finding and then managing duplicates with the currently free to download Spanning Tools here:
I would like to manually merge two items. Does this feature exist? This isn't as complicated as automatic detection of duplicates, but it would be useful. I think that the 'conflict resolution' logic from sync would be perfectly suited for this. I picture doing a multiple-select in the middle pane then right-clicking, and using a 'merge items' on the context menu.
I'm really keen to find out how to access the 'hidden' setting to detect duplicates. Importing my current pdf library shows me how many duplicates I've created over many years, and sorting it all manually will be very time-consuming. If Zotero is likely to be able to do it for me pretty soon (eta??) then I'll wait. I'm running Zotero 2.0b6.3 (using it with NeoOffice). A 'merge duplicates' capability - where Zotero prompts to merge items that it flags as duplicates - would be perfect. So I'm very strongly agreeing with all the comments above.
ditto; merging duplicates would be great -- Ideally, autorecognize if -enough- is the same, and then let you select which of the fields that are different are correct. Notes would ideally be aggregated, unless exact duplicates.
Here's another thought -- duplicates between group library and personal library. What if we want to bring notes and changes made to a group library entry back into our own personal library? Is there a way to do that sort of merge?
I'd like to add my support for this function! After importing various libraries from other sources, I have so many duplicates it's very hard to go through and clean the files.
Is it possible to find out which collection your items are in?
e.g. if you do a search for "Obama" in the overall "My Library", and three duplicate papers by Obama come up, can I find out which Collection they are hiding in? (before I go ahead and delete two of them!)
I'd rather not search each of 30+ collections one by one...
Even if the complete duplication detection functionality is not ready yet, would it be possible to enable detection of exact duplicates?? This doesn't seem difficult to implement, and if you use one resource for all reference information (e.g. pubmed) all duplicates will be exact duplicates.
as Dan has explained elsewhere the problem isn't the detection, but what to do with duplicates - e.g. how to deal with items that have been cited in documents in the past. I understand the milestone for this is 2.1
Any detection at all would be helpful. Just listing duplicates so I can decide what to do with them would be a start. I've just started using Zotero, so dealing with duplicates that have already been cited is not a big problem. (A possible solution would be to replace duplicates with a "redirect", but I guess that'd require quite a bit of recoding of Zotero.)
I am trying to figure out what is currently implemented. There have been multiple posts that say this is in the beta (I am using 2.0b7.6) but no post about which about:config preference controls this. Why won't anyone tell us how to enable it? We are beta testers, after all.
Personally, I'd just like one to show all the duplicates so I can choose to delete the newest one. That way, when I add citations I can check for any duplicates and immediately remove them. This is a show-stopper, in my opinion. How is it possible to remember whether I've already added a citation? The whole purpose of reference management software is to make it easier to reference material. This makes it more difficult that doing it manually.
If I can see a barely functional utility for removing exact duplicates, with occasional progress, I will continue with Zotero. Otherwise I'll have to look for another solution. Perhaps using Perl with curl or wget from PubMed and the OpenOffice bibliography.
I just want to add my vote for this. I'm using Zotero now for over a year and it get's more and more important to get at least a simple duplicate detection!
A great help would be an detection during the import, so before the problems with already cited duplicates would arose. I guess it can't be to hard to implent something like "Warning possible duplicate. Do you want continue on importing?" and it would be a GREAT help at least to us life science people (dealing with pubmed).
You do an awfull good job wit Zotero and I guess so many people would use it without this little flaw..!
I agree--I do not use Zotero because of this. It is the only reason I haven't used it--without this it is not useful for me--I can't bother to take the time to weed out duplicates manually. This is supposed to save me time!
just as a general note - 2.0 is feature frozen. So if this is going to happen it's going to be in 2.1 - I think that's definitely the sense. I agree with gebauer - if this turns out to be too hard to detect and delete, some type of check for duplicates while adding would make a lot of sense.
I think a way to merge items would be the more significant feature, and possibly simpler to accomplish than duplicate detection which could be more of a nuisance than a help if it is not really well implemented. I can find dups fairly easily, but then I have to figure out which one I have cited in documents, which one has more/better-formatted data, manually move any attachments that need it, etc.
Another thing that could really help for "pruning" a database would be a perfect export that could be re-imported without any loss (why not CSV?).
In legal research, we trawl over the same online material repeatedly, and the accidental accumulation of duplicates is inevitable, and likely to be annoying. As this affects other researchers as well, I would vote for making this one a development priority in ver 2.1.
Even detection of only exact duplicates would be a vast improvement. I suspect many users, like me, primarily use one database for any given research area. In this case, exact duplicates are actually a majority of the duplicates that arise.
Very strange behavior... and potentially dangerous!
How do I turn this off?
I don't see anything that seems remotely close searching for zotero in about:config
Still a little troubling.
Exactly how?
http://blog.spanningsync.com/
thanks for working on the dup-detector!
Here's another thought -- duplicates between group library and personal library. What if we want to bring notes and changes made to a group library entry back into our own personal library? Is there a way to do that sort of merge?
e.g. if you do a search for "Obama" in the overall "My Library", and three duplicate papers by Obama come up, can I find out which Collection they are hiding in?
(before I go ahead and delete two of them!)
I'd rather not search each of 30+ collections one by one...
(Hold down Ctrl.)
Personally, I'd just like one to show all the duplicates so I can choose to delete the newest one. That way, when I add citations I can check for any duplicates and immediately remove them. This is a show-stopper, in my opinion. How is it possible to remember whether I've already added a citation? The whole purpose of reference management software is to make it easier to reference material. This makes it more difficult that doing it manually.
If I can see a barely functional utility for removing exact duplicates, with occasional progress, I will continue with Zotero. Otherwise I'll have to look for another solution. Perhaps using Perl with curl or wget from PubMed and the OpenOffice bibliography.
I just want to add my vote for this. I'm using Zotero now for over a year and it get's more and more important to get at least a simple duplicate detection!
A great help would be an detection during the import, so before the problems with already cited duplicates would arose. I guess it can't be to hard to implent something like "Warning possible duplicate. Do you want continue on importing?" and it would be a GREAT help at least to us life science people (dealing with pubmed).
You do an awfull good job wit Zotero and I guess so many people would use it without this little flaw..!
Best and thanks,
Jan
bpeyser
Another thing that could really help for "pruning" a database would be a perfect export that could be re-imported without any loss (why not CSV?).
In legal research, we trawl over the same online material repeatedly, and the accidental accumulation of duplicates is inevitable, and likely to be annoying. As this affects other researchers as well, I would vote for making this one a development priority in ver 2.1.