Better duplicate management

kirk86 · October 3, 2019

Hey folks,

I was wondering if we could have better duplicate management or solutions. Let me elaborate a bit. If visit the duplicate items it shows all my duplicates but I would prefer

1) Since zotero has the ability to extract text from the pdf to actually evaluate and compare not only title and authors but also text of the pdfs. At the moment seems that is only comparing the titles. For instance, I have two items with the same title which shows up in my duplicate items section but are from different authors and of different type as well.

2) Can we edit duplicates from the duplicates items instead of going back searching where in the collections are the items listed. For instance, if I select one item from the duplicates it automatically selects the n-1 items with the same title in order to merge them but since some items might have different types then the merge option is not available. The other thing is that we can't see from the duplicates items in which collection an item is filed since it selects all of them and there's no option of using that ALT key as in the other collections or in the root collection.

dstillman · October 3, 2019

For (2), see the documentation:

You can select a single item in the “Duplicate Items” view by holding down Alt/Option while clicking. You can de-select an item from a set of duplicates by holding down Ctrl (Windows/Linux) or Cmd (Mac) while clicking.

kirk86 · October 3, 2019

@dstillman thanks, that's great to know.

mjenny46 · March 6, 2020

Hi,

I need to do large-scale merging of duplicates (pairs) or large scale deletion of latest entries. Is there a way to do that?

clemon · March 11, 2020

Good info here.

I'd also like the option to delete all others and keep only one of the duplicates. This would help with the duplicate attachments issue noted elsewhere.

tomer.czaczkes · May 12, 2020

I also would like to do a large-scale merging of duplicates (all my entries were duplicated when backing up from my online account). Any way of doing this?

dstillman · May 12, 2020

all my entries were duplicated when backing up from my online account

I'm not sure what you mean by "backing up from my online account", but syncing doesn't create duplicates. The only way to get mass duplicates would be by exporting to a file and reimporting, in which case the best way to remove the duplicates would by sorting your library by Date Added and deleting the entire block of items from that import.

tomer.czaczkes · May 12, 2020

yep, well spotted - that's exactly what happened. And the solution with sorting by date added was exactly the right thing to do - many thanks!

smatthie · May 27, 2020

I second the original suggestions from @kirk86 .

I have quite a number of items that are individual volumes of a multi-volume book that all have the same title and author, only the "volume" field (and some of the years) is different. Particularly before the 20th century scholars tended to write these enormous tomes with up to a dozen volumes or so. Of course they don't have an ISBN or other unique identifier either.

This is annoying because my duplicate list has so many wrong entries that it becomes difficult to see the ones that are duplicates.

Not sure what the best way is - we probably need some flexibility that true duplicates are recognised even with some differences (no database is perfect), but perhaps a way of marking items to tell Zotero that these are really different?

estherkoehring · May 31, 2020

I have similar needs like smatthie: In my group library we have a large number of wrongly detected duplicates from the 18th and 19th century. It basically makes the duplicate feature unusable for us.

A rather simple fix that would help people working with pre-ISBN books would be a small tweak in the detection mode: Compare not only title, author and year, but the volume nr.! A book with identical author, title and even year is not a duplicate if one of them is volume 1 and the other volume 2 or 42 and so on.

Of course I would prefer a possibility to mark wrong positives, but I gather that is complicated. My suggested fix should be easier to implement and should not mess with other people's databases. Pretty please!

dstillman · June 6, 2020

@smatthie, @estherkoehring: I've added an issue for including Volume in the duplicate check.

iibewegung · June 9, 2020

I came to this post after searching for {duplicate check}.
I think the author criterion can be refined a bit, as it detects two entries (one book, one journal article) with the same title and 1 common author as duplicates.
eg in my case: "Deep Learning" book (deeplearningbook.org) vs article (10.1038/nature14539), each with 3 authors (1 in common)

Would it be possible change the logical clause to "if one group of authors is a subset of the other"? This would rule out the possible case where one entry is a later edition of the other, with more authors joining.

archaeo_arizona · October 30, 2020

Regarding historical books with many volumes appearing as Duplicate Items: if Zotero was able to resolve those duplicates, that would be great.

It is not a problem when adding one of these items into a document, because you can see the volume number.

I have been dealing with this by sorting the duplicates by year. The historic volumes all drop to the bottom, where at least I can ignore them!