Select merged item after merging

I am using the "Merge Items" tool to remove duplicates in my library. After merging, I usually still need to remove some coloured tags, delete the duplicate files in the merged item and other final checks on the metadata. So I would like to get the merged item selected after merging, so that I can finish the manual editing after the automatic process.

However, at the moment, after merging another items is selected. I haven't understood the logic of which item is selected, but it jumps away from the merged item that has just been produced.
Is it possible to change this behaviour and select the merged item after merging?
  • We recently did some work to make sure that the correct item is selected after merging in Duplicate Items, but I hadn't considered the manual merge case. The behavior you suggest makes sense. I'll look into it.

    When you find yourself having to manually deleted duplicate files after merging, are the files PDFs? I ask because you should not need to do that anymore! Attachments get merged automatically if the contents are the same. I'd like to get to the bottom of why it's not working in your case.
  • edited November 10, 2022
    We recently did some work to make sure that the correct item is selected after merging in Duplicate Items
    Could you clarify what should be happening after merging items from the Duplicate Items collection? I assume that the "correct item" is the merged item?
    After merging in the Duplicate Items collection, the merged item disappear from the Duplicate Items collection. So the selection jumps to another item to de-duplicate in the Duplicate Items collection. If I go to My Library, none of the items are selected.

    Usually, the most efficient way to look at duplicates in the Duplicate Items collection is to sort the library by Title. But if I want to find the merged item in My Library after merging, I need to sort by Date Modified. So to edit items before or after merging, I need to go back to My Library, sort the library by Date Modified to find the merged item, do the edits, and then go back to the Duplicate Items collection and sort again by Title to continue merging other items.
    In the multiple sorting steps, I usually loose track of the papers I wanted to de-duplicate. I have also tried to use the search instead of re-ordering the library. But simply switching to My Library or to another collection clears the search. So this does not work.

    In a 15k items library, each step is taking some time. So merging from the Duplicate Items collection is not working well for me. It is fine, as I assume that it is probably working best for some other people. So to solve the problem, what I do is to tag a group of duplicates I want to fix with a coloured tag, which moves them to the top of My Library when sorting by Date Modified, from where I can do all the merging steps easily.

    Would it make sense to allow for different sorting configurations in the Duplicate Items collections, and keep the original sorting in My Library? Every time I go to the Duplicate Items collection, it triggers a new search anyway, so the situation is different from other collections.

    Another limitation that makes it difficult to merge duplicates in the Duplicate Items collection is that you cannot edit or even copy any of the metadata of the items when you are in that collection.
    For example, I can de-duplicate a preprint and the final publication. I would still like to keep the link to the preprint in the merged item, because it could contain additional useful information on the results published. But the merging process does not allow me to transfer any of the metadata between the items to merge.
    I mention preprint, but it could be the link to the paper on the author's website, or an old book scanned by different repositories, ...
    Or it could be that the during the merging process I realize that none of the entries I have is complete, so I need to transfer the URL or the editors from one item to the item with the best metadata.

    When you find yourself having to manually deleted duplicate files after merging, are the files PDFs?
    Yes, most of the time, as most of my library is PDF files. But I also need to remove some bad attached links to URI left-over from the Mendeley import process, duplicate notes, ...
    I ask because you should not need to do that anymore! Attachments get merged automatically if the contents are the same.
    I have done more than 100 manual merging in the recent few days, with the latest Zotero version. I indeed observe that some PDF files are automatically removed from the merged item, but only in about less than 20% of the cases I would say. In some cases, the files are indeed slightly different, but in some cases, very similar files were still kept in the merged item.

    I have just tested this again on a fairly recent PNAS paper from 2015. I had added the files to my library the same day in 2016. I cannot see any difference between the two files. To get good metadata, I have imported again the paper to Zotero from the website using the Zotero Connector. I have then merged the 3 items, selecting the latest metadata. The three files were kept. The only difference I could see in the latest PDF file was that the publisher has added on the side of each page: "Downloaded from ... by ... on ... from IP addres ...".

    I have also tested from a 1993 paper from the Journal of the American Ceramic Society. As far as I can see, the files are identical. It looks like a scan and OCR of the paper: the text can be selected and copied, but the highlight has an offset with the actual text. The only difference is that the publisher has added download information text as in the previous example. After merging, the two PDF files are kept.

    More recent articles seem to work better, but still with exceptions.
    In the testing process, I wanted to revert the merging, but I could not find how to do it. Is it possible?

    In some cases, the files are actually quite different, so I don't know if any automatic process could (or even should) remove duplicates. Some typical cases leading to duplicates with different files:
    1) Some journals post the accepted version of the paper before publishing the final edited version.
    2) Preprint and final publication
    3) Some publishers like to add a first page to the actual paper (for example AIP or IOP), or at the end sometimes, containing additional information on the paper and some "Articles you may be interested in". The content of that page changes over time, so the content of the PDF files varies.
    4) Sometimes, I also have a copy of the paper from another website that also adds a first page, like ResearchGate.
    5) Some papers can be published on multiple websites, like old papers or books with different scan copies, or PhD thesis with strictly the same content but additional marks on the files, government reports, conference proceedings, ...
    6) In some cases with identical files, one file has annotations, but not the other. So I need to check the annotations carefully and decide if I want to keep only the one with annotations, or if I need to merge manually the annotations if both files have annotations. Merging does not give any information about the annotations, and you cannot spot it except if you open the files in the Zotero PDF Reader to check carefully every time you merge items.


    I could probably continue the list further, but the point is that it is actually a fairly common situation that merging leaves multiple PDF files that still need to be cleaned manually.
  • I'm not going to read all of this — I'd really encourage you to try to make your posts shorter — but on this:
    Another limitation that makes it difficult to merge duplicates in the Duplicate Items collection is that you cannot edit or even copy any of the metadata of the items when you are in that collection.
    For example, I can de-duplicate a preprint and the final publication. I would still like to keep the link to the preprint in the merged item, because it could contain additional useful information on the results published. But the merging process does not allow me to transfer any of the metadata between the items to merge.
    You can choose individual fields from each item by using the buttons along the right side of the item pane in the merged view.

    You can also use arrow keys or modifier-key deselecting to select a single item and edit it.
  • edited November 10, 2022
    I had missed the fact that you recover the ability to edit and copy the metadata by selecting a single item with Alt+Click and with the arrow keys, thank you.

    Just a small remark then: the double click to open the PDF file is not working when selecting a single item in the Duplicate Items collection. Pressing Enter is working fine.
  • [...] I would like to get the merged item selected after merging, so that I can finish the manual editing after the automatic process.
    I think it would be nice if the merged item could be selected and remain displayed in the "Duplicate Items" collection. If it vanishes immediately, there's no way to check the result. The merged item could then be removed when you move focus to another item.

    A workaround is to add the items you're going to merge in "Duplicate Items" to another temporary collection. You can then see the merged version there.

    BTW, due to the items vanishing immediately, I usually prefer merging manually in My Library. If you sort by BBT's citekey column, duplicate items will usually be next to each other. Select both items, right-click, and select "Merge Items...". A postfixed citekey also gives a hint that a duplicate item might exist.
  • A workaround is to add the items you're going to merge in "Duplicate Items" to another temporary collection.
    I use a similar workaround. But I prefer using coloured tags, as I find it easier than moving items between collections. I first sort the items in the Duplicate Items collection so that duplicates appear together (by Title or citekey). Then tag them all (or a sub-set I want to fix) with the tag "Duplicates". Then I can merge duplicates from My Library, after sorting by Date Modified. Usually, the duplicates items still stay together after sorting by Date Modified, so that the merging process is still fairly easy.

    Then I just need to remove the "Duplicates" tag after checking the merged item. I can easily find the recently merged items to check them by removing the "Duplicates" tag from the items still in the Duplicate Items collection, leaving only the recently merged items with the "Duplicates" tag.

    Bonus: I also use a "Non-duplicates" tag to solve the problem of false positive matches. This saves me time not going through them again in the future, and it will be easy to remove them when this functionality will be added.


Sign In or Register to comment.