(semi-)automate merging duplicates

My personal practices result in frequent need to merge rather large sets of duplicate entries. Better protection against importation of duplicates into the library in the first place would be good, but until that rather difficult problem is solved, I would appreciate it if after review of what is identified under the Duplicate Items collection (which is really quite good at finding duplicates), and removal of any I might not agree with, I could accept the rest for automatic merger. At present, I have to go from pair to pair and click on Merge 2 items.
  • Just commenting here to say I was hoping for a similar solution. I have not been very diligent about merging duplicates, and I do very large batch-imports of references. I have nearly 6000 duplicates in my library now. Merging them all by hand is taking quite some time. it would be useful if there was a way to merge them all, with a preference for the newest version. This would be done with the full knowledge that duplicates imported since a field was modified would overwrite those changes- however that seems like a small price to pay.
  • edited February 28, 2014
    OK I've been working on a solution to this problem. Basically- my library has so many duplicates in it, that getting rid of them by hand is just a no-go. I have over 5,000 duplicates, and merging each one takes ~8 seconds. It would over 2 straight days. Plus, with the number of citations I'm pulling in on a monthly basis- I'm sure to grab a bunch of duplicates in the future, so it's not worth i to put in the pain of doing it manually once. Since it seems like I'm one of the only people currently concerned about this, I'm mostly just posting for posterity in case some interested reader comes along in the future. After deciding that writing a plugin for Zotero was way over my head, I decided to go with a really low-level solution. This is one step above the Homer Simpson home office solution (using a drinking bird toy to press "Y" repeatedly).

    I wrote an AutoHotkey script to activate Zotero and press "Enter" contantly. Basically, my plan is to open Zotero, navigate to the Duplicates folder, merge one duplicate and then activate the script. This way, each time "Enter" is pressed, it merges the duplicates, and then waits until Zotero is ready before pressing "enter" again.

    Like I said, this is an embarrassingly low-level solution, but I'm no programmer. If I were, I'd be ashamed, but its really not worth my time to figure out how to write a plugin for this.

    So: Dear future interested person,
    Assuming that they have AHK in the future...download AutoHotkey here(http://www.autohotkey.com/)

    install the program and then copy and paste the code I made into a notepad file. Save the file as "yourfilename".ahk Open Zotero and merge one duplicate, then double-click the .ahk script to run it.
    Hope this helps someone.

    Sincerely,
    a person from the past

    MsgBox,4,, The script is active. Would you like to cancel?
    IfMsgBox, Yes
    Exit
    ; Otherwise
    Loop ;
    {
    IfWinExist, Zotero
    {
    WinActivate
    }
    else
    {
    Gui, Add, Text,, Zotero is not running currently
    }
    Send {Enter}; press enter
    WinWaitActive, Zotero
    }
    return



    p.s.- I can tell already from my limited debugging efforts that this script is really poorly written. the WinActivate command is pretty spotty at actually activating Zotero and not inputting enter into whatever you're working on at the moment. But in general, it works. My plan is to basically set it up to run overnight so it won't interfere with my normal computer activities. Good luck!


    *EDIT: I forgot to mention that this tool obviously doesn't make any discrimination about which version to keep, or which fields to import beyond Zotero's defaults. This is fine with me, since I have far too many to care about. If you have substantial changes that you need to make sure are kept, then you'll need a finer-grained control over the duplicate merging process- in which case scripting is probably not the best solution for you anyway.
  • I don't know if it would fit with your workflows, but I just thought I'd mention the extension at

    https://github.com/chrisjr/zotero-prevent-duplicates

    I use it and it saves me lots of trouble avoiding duplicates.
Sign In or Register to comment.