Better duplicate management
Hey folks,
I was wondering if we could have better duplicate management or solutions. Let me elaborate a bit. If visit the duplicate items it shows all my duplicates but I would prefer
1) Since zotero has the ability to extract text from the pdf to actually evaluate and compare not only title and authors but also text of the pdfs. At the moment seems that is only comparing the titles. For instance, I have two items with the same title which shows up in my duplicate items section but are from different authors and of different type as well.
2) Can we edit duplicates from the duplicates items instead of going back searching where in the collections are the items listed. For instance, if I select one item from the duplicates it automatically selects the n-1 items with the same title in order to merge them but since some items might have different types then the merge option is not available. The other thing is that we can't see from the duplicates items in which collection an item is filed since it selects all of them and there's no option of using that ALT key as in the other collections or in the root collection.
I was wondering if we could have better duplicate management or solutions. Let me elaborate a bit. If visit the duplicate items it shows all my duplicates but I would prefer
1) Since zotero has the ability to extract text from the pdf to actually evaluate and compare not only title and authors but also text of the pdfs. At the moment seems that is only comparing the titles. For instance, I have two items with the same title which shows up in my duplicate items section but are from different authors and of different type as well.
2) Can we edit duplicates from the duplicates items instead of going back searching where in the collections are the items listed. For instance, if I select one item from the duplicates it automatically selects the n-1 items with the same title in order to merge them but since some items might have different types then the merge option is not available. The other thing is that we can't see from the duplicates items in which collection an item is filed since it selects all of them and there's no option of using that ALT key as in the other collections or in the root collection.
I need to do large-scale merging of duplicates (pairs) or large scale deletion of latest entries. Is there a way to do that?
I'd also like the option to delete all others and keep only one of the duplicates. This would help with the duplicate attachments issue noted elsewhere.
I have quite a number of items that are individual volumes of a multi-volume book that all have the same title and author, only the "volume" field (and some of the years) is different. Particularly before the 20th century scholars tended to write these enormous tomes with up to a dozen volumes or so. Of course they don't have an ISBN or other unique identifier either.
This is annoying because my duplicate list has so many wrong entries that it becomes difficult to see the ones that are duplicates.
Not sure what the best way is - we probably need some flexibility that true duplicates are recognised even with some differences (no database is perfect), but perhaps a way of marking items to tell Zotero that these are really different?
A rather simple fix that would help people working with pre-ISBN books would be a small tweak in the detection mode: Compare not only title, author and year, but the volume nr.! A book with identical author, title and even year is not a duplicate if one of them is volume 1 and the other volume 2 or 42 and so on.
Of course I would prefer a possibility to mark wrong positives, but I gather that is complicated. My suggested fix should be easier to implement and should not mess with other people's databases. Pretty please!
I think the author criterion can be refined a bit, as it detects two entries (one book, one journal article) with the same title and 1 common author as duplicates.
eg in my case: "Deep Learning" book (deeplearningbook.org) vs article (10.1038/nature14539), each with 3 authors (1 in common)
Would it be possible change the logical clause to "if one group of authors is a subset of the other"? This would rule out the possible case where one entry is a later edition of the other, with more authors joining.
It is not a problem when adding one of these items into a document, because you can see the volume number.
I have been dealing with this by sorting the duplicates by year. The historic volumes all drop to the bottom, where at least I can ignore them!