Delete Duplicates

wanted to throw my hat in for being able to delete duplicates from the "Duplicate Items" section. detecting the duplicates is really great, but the only option seems to be to "Merge 2 Items." I'd like to be able to just select the one I'd like to keep and delete the other. At the moment, I'm detecting the duplicate, then going back to the original list and deleting each duplicate individually.

Thanks!
Paras.
«1
  • Why is merging the items undesirable?
  • Merging the files is undesirable for me because most times the second file is a duplicate of the first. Merging puts two of the same documents under the same heading, but I still have two copies. I really want to delete the second copy and save 50% of the disk space.

    Hope this is clear, but if not, please let me know and I can expound further. Thx.
  • A better process would be to drag the duplicates to a temporary collection, merge them, and then go through the temporary collection and delete the child items you don't want.

    A future version of Zotero will likely delete identical attachments automatically.
  • Pretty amazing that it is so complicated to delete duplicates. That's a basic Endnote function.
  • Scott: Not sure what you're referring to. Zotero 3.0 already has duplicate detection and merging. This thread is about deletion of identical child attachments.
  • +1 for wanting to auto delete identical child attachements. It would also be nice to have to option to choose to auto delete all but the most recent child attachement if they match othewise. I am getting a lot of duplicate pubmed links, so their date I accessed is different but everything else is the same. More tough, is the issue where I have autodownloaded the pdf twice. I may have gone into one pdf and highlighted while the other I have not. You could have an option to delete all but the most recent child attachement if they have the same name, which would solve this dup pdf problem, assuming the edited pdf was more recently modified.
  • Here's the relevant issue, for reference: https://github.com/zotero/zotero/issues/61

    Removing completely identical files, and identical linked files and URLs, would be a good first step, though we still need to decide which of the titles/filenames to keep. (Also need to merge tags and related items and maybe concatenate embedded (right-pane) notes.)
  • I Might have a similar problem.
    I realised that I have a massive amount of duplicated items - in fact alls my items exist either in two or six version. THe merge item functions seems to work fine, but for the fact that all the attachements (example: a library bookmark, a sort commentary), notes are not deleted but subsist in 2 or 6 copies on my main library. I guess it is possible to delete them individually, but that would take a very large amount of time.
    COuld you confirm that I understood well what had been written and that there is no other option available?
    THank you in advance.
  • edited June 10, 2013
    Egochan: Depending on how those duplicate items can to be, you may be able to sort your library by Date Added and just delete the batches that you don't want (if say, they came from multiple import attempts). Start a new thread if you need more help with that.

    Beyond that, yes, right now when you merge items it keeps both sets of child items.
  • Hello, Dan,

    The duplicate detection feature has been working out very well for me overall.
    However, I have ran into a seemingly impassable problem:
    I have two items with the same title and year, but one is a presentation and the other is a journal article. The name of the (sole) presenter is the same as the name of the lead author of the journal article.
    They show up paired on my duplicate items list. Merging is not offered as an option since (naturally) "Merged items must all be of the same item type".
    I have a feeling that this (items of a different type) could be a case where an option other than merging should be offered, e.g. "these are not duplicates, but different items", thus removing them from the duplicate items list.
    Is this something that you would consider working on?
    Thanks in advance for all your hard work. Greatly appreciated!
  • +1 to ahannides' idea of an option to say "these are not duplicates, but different items".

    I have nine volumes of a set entered in my collection. Each is distinct, with different sub-titles, authors, and linked files. Three of these are showing up in duplicates, although they each of these have a different sub-title, different linked files, and only two of three volumes have the same authors.

    I don't know why these three are showing up as duplicates and the other six are not.
  • I have a slightly different problem. Zotero detects too many entries as duplicates while they certainly aren't (Even the titles and are different). The only option I see is to merge the items. I would like to be able to tell zotero to ignore these entries.
  • that's quite odd. While false duplicates do happen, they should be rare. Do the items have the same DOI?
  • Hmm, you are right, it seems they have the same DOI, which was my mistake (not sure how it happened, I was trying to import them from Mendeley).
  • if you still have them in Mendeley, could you check whether they have the same DOI there, too. If not, could you check the export from Mendeley you used?
  • ahannides: a work around

    1. Send both "duplicates" to trash
    2. Return one to the library, change something about it to distinguish it.
    3. Go find the other in the trash and return it to library. It should not not be considered a duplicate.

    I hope this helps
  • Nick: Moving items to the trash and back wouldn't be a relevant step here. Duplicate detection isn't currently stateful in any way. If the items pass the tests, they're duplicates. Otherwise, they're not. The only thing that would cause items not to be detected as duplicates currently would be to change the relevant metadata such that the items no longer passed Zotero's duplicate checks.
  • Right, but the only way to select one (as opposed to both) from the duplicate page in order to edit its metadata is to send them both to the trash and return only one. I now realize that this can be done from the main library page. Sorry for the redundancy.
  • You can right click items (if they're not already selected) to select individuals in the Duplicate collection
  • Also ctrl/cmd clicking can be used to (de)select items
  • Dan - in the 12/30/2011 post above, you mentioned that a future version will likely delete identical attachments. Just curious how (or if) this will be implemented?
    I am considering adding duplicate attachments on purpose, but they will be stored in different locations. One location will be OneDrive and one location will be my local hard drive. To differentiate them, I was thinking of adding a tag ("local", "OneDrive") to each entry. Will the future version interfere with this, or does the tag and storage location mean they are not identical and therefore would remain untouched?
  • minicht: I think we'd only merge two linked files with the same path or two (identical) stored files, not a linked file and a stored file or linked files with different paths.
  • +1 for manual duplicate deletion (from the "duplicate items" bin). I've yet to encounter a situation where merging the duplicates was useful (although I'm sure there are some).

    What happens is that I will carelessly add an item twice, and all I want to do is delete one of them.
  • The major reasons to merge rather than delete are to preserve tags and collections that one item belongs to but not the other (the merged item will belong to all of them) and to maintain active citations in the word processor plugins. If you don't use either of these, then I would guess that merge duplicates isn't too useful to you, as you say.
  • Aha! I've only started to use tags extensively so this makes a lot of sense. Thank you!
  • You can also use Ctrl/Cmd click to de-select items you want to keep in the Duplicates items "bin" and then delete the rest.
  • Is it possible to remove all duplicates automatically? I am dealing with a huge list of items and cannot simply go one-by-one merge, the only feasible approach for me is to automatically remove (possibly with auto-merging) all duplicates detected by Zotero. Thanks
  • no, not possible, sorry. Depending on how they were created, sorting by date and deleting might work, though.
  • edited June 10, 2015
    Like jacekg I also would need some way to remove all duplicates automatically. I usualy have to import the results of more than one database (PubMed, PsycInfo, Embase, ISI web of knowledge, etc...) and then merge all. But, sometimes is more thans 3 thousand duplicates and usually I have to use Endnote to do so.

    Is it something that you are working on or will work on?

    Thanks!
  • A thought to consider: I find that the completeness and accuracy of the metadata can differ across those databases.

    Although considerable effort is required, merging the duplicates and selecting the best version of each will produce an excellent result. When you compare the records of the same item you might expect that each database should be essentially the same. That is _not_ the case.

    Author names will differ in completeness. Publication years often differ. Sometimes the issue number is included, sometimes not. Sometimes the DOI is included, sometimes not.

    One of the databases you mention deliberately changes the article title in minor ways. For example, if the published itle includes a subtitle separated by a colon the database may instead use a dash or a period. Word pairs that are not hyphenated become hyphenated. In a few cases, I have seen British English spellings altered to American spellings and similarly in the opposite direction. These alterations are minor Mountweazels to help identify copyright infringement and wholesale copying of records.
Sign In or Register to comment.