Merge - auto function?

DWL-SDCA · May 2, 2020

@ckemere wrote: The problem I find is that Mendeley/PubMed often classified conference papers as journal articles for some reason, so I still have to do some manual clicking to normalize these entries.

I urge you to be careful. PubMed lists literally hundreds of journals that are "Proceedings of ..." These are treated by the professional association as a serial not as a conference proceedings book. I could give specific examples if requested.

Many of the PubMed proceedings are in serial publications that have an ISSN and not in proceedings books with an ISBN.

Often these proceedings appear in supplement issues of unaffiliated (or loosely affiliated) journals.

ckemere · May 2, 2020

The auto-lookup result for IEEE conferences appears to be "conference paper" (PubMed lists as "journal"). NIPS conferences, for some reason, end up as "book". In my case, I don't really think it matters, as long as the right DOI/PMID are associated and my library is consistent. In my field, I've never used the ISBN or ISSN to actually find a document/book.

thinkround · August 7, 2020

Maybe there is a possibility to write a plugin for this?
You could tell the plugin if you'd like to prefer the older or the newer version (=master) and which fields have to be 100% identical and which fields should be taken from master only, then proceed merging all duplicate articles that are 100% identical in the fields that you specified and skipping all others for manual merging. Clicking more than 10 times on the merge button without looking at what you are merging makes zero sense. I may give it a go in November if there is demand for this plugin, or maybe someone comes up with something even better before then.

agoldenvein · April 4, 2021

I must say the current tool to duplicate items that auto-selects each bundle is very nicely done. But it would be really great to be able to automatically merge items whose only difference is the "Accessed" field. I'm not a programmer, but this sort of sounds like something that could be done with javascript...

sypianski · May 4, 2021

Is there any workaround? For example, exporting the database elsewhere, merging duplicates automatically, and reimporting it to Zotero?

I prefer to lost records than to merge everything manually. I'd appreciate having this option.

johnmy · May 4, 2021

@marcelparciak coded a java script at https://forums.zotero.org/discussion/40457/merge-all-duplicates/, which could merge in some automate degree

Yvon Jaillais · April 25, 2022

I also have thousands of duplicates. An automatic merging is a MUST to implement.
I get that there might be some mistakes made, but to be honest when you have to merge 1000s of papers anyway, you don't review them one by one, you just click click click....

cbourbousse · June 20, 2022

Still no auto merge ? Still awful yes !

sulgikim · November 29, 2022

I found this amazing automerger developed by Frangoud which worked for me.

https://github.com/frangoud/ZoteroDuplicatesMerger

meika · December 5, 2022

I have duplicates from years apart syncs on zotero, hundreds, this is more than "15 minutes work" (I just did about 15 minutes before coming here... and they are 90% of them identical except for a timestamp (one from 2004 (original endnote import) and one from 2014).

I do not care about the "edge-cases" I care about my time, approaching 60yo there is less of it. Surely computers are for automatiing the menial?

Other dealbreakers in the literature are Nextcloud's lack and refusal of a one way sync, (no doubt a baked-in design "feature" from not expecting people to have decades of work on a PC anyways surely everyone is one the cloud, (sorta the opposite of the problem here) and Microsoft Sharepoints refusal to allow/expect web 1.0 site mapping style structures easily without paying $10K a year to some app, totally flat horizontal sharing, no curating allowed, too bad if you want to allow peeps to browse and discover stuff one does not know the name of to search). And do not get me started on Adobe.

meika · December 5, 2022

regarding https://github.com/frangoud/ZoteroDuplicatesMerger

instructions refer to the file .xpi

can't find it listed, don't know what to download to have it included, is github always this obtuse?

wakiv · December 5, 2022

release, on the right: https://github.com/frangoud/ZoteroDuplicatesMerger/releases/tag/v1.1.5

Is there any tool that checks for duplicate attachments by checksum?

adamsmith · December 5, 2022

Is there any tool that checks for duplicate attachments by checksum?

Zotero does that on merge -- when two attachments have an identical checksum, the newer(?) one gets deleted on merge.

mjthoraval · December 5, 2022

@adamsmith Zotero does not use any attachment information to identify potential duplicates. See: https://forums.zotero.org/discussion/98627/checking-for-duplicates
And also:

Zotero currently uses the title, DOI, and ISBN fields to determine duplicates. If these fields match (or are absent), Zotero also compares the years of publication (if they are within a year of each other) and author/creator lists (if at least one author last name plus first initial matches) to determine duplicates. The algorithm will be improved in the future to incorporate other fields.

https://www.zotero.org/support/duplicate_detection#finding_duplicates

So the removal of identical files that you mention is relevant only if the metadata of the items is good enough to be identified as duplicates (or if you have another way to identify them as duplicates).
At the moment, I do not know any tool checking for duplicate attachments by checksum.

I think Zotero keeps the file attached to the master item selected during merging.

adamsmith · December 5, 2022

right, I read the question to be about merging, not detection.

mjthoraval · December 5, 2022

I see, sorry, I probably misunderstood the question given the context.

wakiv · December 5, 2022

Thank you, the reason I was asking about the merge is because I noticed in the past that some items had identical attachments after merging. That was a while ago, great that this has been improved.

meika · December 6, 2022

okay I've dealth with them, some other suggestion are:
—that the duplicates folder have a subcollection for acknowledged false positive candidates,
—and another for recent possible duplicates, th ehistorically dealt with ones where a bit awkward.

Also some references to last century newspapers on trove.nla.gov.au are all titled 'Advertising' (I had about 100) came up as duplicate candidates so an
—ignore some field with e.g. "trove.nla.gov.au' might be useful

rgettys · May 23, 2024

Could it be possible to use unique identifiers to resolve concerns?

Any time an article with the same DOI/ ISSN is uploaded, it is auto merged into the existing record?

(Could it also be done to have the option to automatically update fields based on an external database like crossref?)

DWL-SDCA · May 24, 2024

Alas, for many journals when there are letters commenting on an article, the letters can have the same DOI. Another problem is that the same identical article can have different DOIs.