Items Mistakenly Identified as Duplicates

I recently imported about 10,000 entries from an old bibliographic database into Zotero. There were some duplicates in this old database, as well as some duplication with items already in my Zotero library. Unfortunately, the Duplicates tool in the Zotero sidebar is somehow mistakenly identifying wildly different items as duplicates, so I cannot find the actual duplicates. In one case, clicking on a single items lists 793 "duplicates," almost all of which have unique titles and authors (a few books are by the same author, etc.). The only similarity in these items seems to be that they were imported at nearly (but not exactly) the same time. Thanks for any help anyone can provide!
  • do they have the same DOI? That'd be the main reason I'd expect to see this.
  • The old database was all entered by hand (just title, author, publisher, etc.), so I don't believe there were any Digital Object Identifiers. Is it possible they were accidentally added as part of the import, or that the import function incorrectly identified some data as DOIs? I believe the old library was in RIS format, and I hand-checked it to make sure that only the relevant data was included (again, title, author, publisher, etc.).
  • It's possible if something in the RIS file got messed up, I suppose (e.g. DO - instead of DA -) couldn't you just check in Zotero?
  • Looking in Zotero, it does not seem that any of the items have DOI information.
  • could you export two of the items (that show as duplicates) as RIS, open with a text editor, copy and paste them here?
  • TY - BOOK
    TI - Jainism in South India
    AU - Desai, P. B. gen ed. A. N. Upadhye (H. L. Jain)
    CY - Sholapur
    DA - 1957///
    PY - 1957
    SN - Jivaraja Jaina Granthamala No. 6
    ER -

    TY - BOOK
    TI - Objectivity Method and Point of View: Essays in the Philosophy of History
    AU - Dussen, W. J. van der
    AU - Rubinoff, Lionel
    CY - Leiden etc.
    PB - E. J. Brill
    SN - Michael Krausez Philosophy of History and Culture 6
    ER -
  • P.S. I'm not sure why that "///" appears after the date in the first record. It doesn't appear in Zotero.
  • Zotero should be doing this better, but your ISBN fields (SN in RIS above) are nonsensical. If you fix that, they should no longer appear as duplicates.
  • I'm guessing they're supposed to be Series and Series Number.
  • "Should be doing this better"--indeed.
    I mean, this clearly is a bug. Zotero shouldn't treat two books as the same because they have anything that's not a number in the ISBN field.
  • I see what you mean. I have tested changing a couple of these and it does seem to remove them from the Duplicate finder.

    Something must have happened during import, because looking at more records, the SN is sometimes the Series, but sometimes it is the Publisher or Edition or something else. The RIS file I imported was actually an intermediate step that itself was converted from an even older database (long story), so something weird must have happened in conversion.

    Is it possible to search just for the records that have this issue, without clicking back and forth to the Duplicates window and then searching for each one individually? I realize there is probably not a way to batch correct all of these, but I think it is too late to correct the original RIS and reimport, because I have already been adding tags and notes and other elements to these records.
  • It's a bit cumbersome, but a saved search with 11 search conditions:
    ISBN -- contains -- %
    (for the ISBN field to not be empty)
    and
    ISBN -- does not contain -- 0
    ISBN -- does not contain -- 1
    etc. should do the trick
  • You can sort by the ISBN column in the center pane (add the column via the icon at the top right of the pane, next to all the other column names). Other than that, you can probably set up an advanced search using a regexp that find items not matching it. You'd have to think how to craft a good regexp though, so it might not be worth the trouble.

    We'll fix the duplicate issue in the next Zotero version, but I'm not sure when that will come out.
  • If you just want to delete the ISBN field, then having a look at the javascript API might be worthwhile:
    https://www.zotero.org/support/dev/client_coding/javascript_api
  • Nvm on the regexp, we don't offer that for general fields (we probably should)
  • edited December 10, 2014
    edit: what aurimas says about regex.
  • I don't actually see the ability to add the ISBN column in the center pane, even under the "More Columns" submenu. Is this because I'm running the Mac version of the standalone? I am running the most recent update (just checked).

    I'd prefer not to delete the ISBN field. I'd like to copy the information into the correct field if possible.
  • there's no ISBN column, aurimas didn't check when he wrote. Normally there'd be little reason to sort by ISBNs.
  • Hmmm, I guess we don't offer that as a column either. Probably because it offers little use beyond filtering out invalid values. I guess what adamsmith suggests above is your only option then. Unfortunately, that won't help a lot unless the field does not contain any numbers at all.
  • I will reverse the logic and search for any field that contains letters. Even if I just search for the five vowels, that should get almost everything.
  • It will take a while for me to fix all of these and see if it works. Right now I see about 690 ISBNs with letters in them, but there are several hundred more mistaken duplicates than that. Still, I sincerely appreciate your help with this issue. Thanks!
  • Using the javascript API to just delete all of those ISBNs or copy them to Series may be your best bet:

    https://www.zotero.org/support/dev/client_coding/javascript_api
Sign In or Register to comment.