How to Fix Inconsistent Author Names in Zotero?

foadsf · July 10, 2024

Hi everyone,

I’ve been using Zotero for years and have imported hundreds of items. Over time, I’ve noticed that many author names are inconsistent, especially with first and middle names being abbreviated differently.

Is there a canonical way to fix this? Ideally, I’d like to see a table of all authors and merge duplicates. Is this possible within Zotero, or is there a plugin or tool that can help with this?

Thanks in advance for your help!

adamsmith · July 11, 2024

Unfortunately not. I've always thought that'd be an area where Zotero could do a lot more, but I'm not aware of any work on this. I thought there was a newer add-on that at least offered some bulk edit functionality, but can't currently find it.

tim820 · July 11, 2024

If you at least want to find all the authors you have stored under different variations of their name, the code in this thread is an easy way to do so.
https://forums.zotero.org/discussion/comment/411459/#Comment_411459

The question of what to do with them is somewhat fraught though. There are no rules (and no easy fixes). Is it reasonable to change how an author's name is recorded in your database (and therefore how it is cited) when the publication you are citing has it written slightly differently (eg full name, initials only, etc) ?

DWL-SDCA · July 11, 2024

I don't have information from the latest APA and Chicago style guides but:

Chicago 16, 14.72

“Authors’ names are normally given as they appear on
the title pages of their books or above their articles.
Certain adjustments, however, should be made to
assist correct identification. First names may be given
in full in place of initials. If an author uses his or her
given name in one cited book and initials in another
(e.g., “Mary L. Jones” versus “M. L. Jones” versus
“Mary Jones” versus “Mary Lois Jones” versus “M.
Jones”), the same form, preferably the fuller one,
should be used in all references to that author. To
assist alphabetization, middle initials or names should
be given wherever known”. emphasis added

APA7 9.8
APA Style tries to alleviate any potential gender bias by only including last names and initials of authors. Even if a full name is included on the source, only list the last name and first and middle initials (if given).

Zotero recommends that all entries for the same author be at the same level of name-completeness so that disambiguation may be handled properly by the CSL styles.
edit: See https://www.zotero.org/support/kb/given_name_disambiguation

dstillman · July 11, 2024

In theory this is where ORCID comes in. There'd still need to be a UI to group/manage different name variations, but they could be linked together by an ORCID, and there could be an option (in the doc prefs or citation dialog, or maybe even in CSL) as to whether to use the entered name or the person's canonical name. But I have no idea what the metadata situation is there in terms of retrieving ORCIDs via our translators.

aborel · July 11, 2024

My perception is that retrieving ORCID is not happening at all at the moment, partly because the availability of that information is not very systematic, but also because there is nowhere to store it in Zotero.

So it is a chicken-or-egg situation :-) Where would it be best to start a design and/or implementation?

dstillman · July 11, 2024

Sorry, I was referring to the availability via the sites/databases/catalogs we support. We could support ORCID even if we couldn't automatically retrieve them from most sites, but it's a lot less compelling.

aborel · July 11, 2024

I'm pretty sure ORCID iDs for some authors can be found on the platforms of many scholarly journals, bibliographic databases such as the Web of Science and Scopus, as well as a growing number of repositories and preprint platforms. So the relevant translators would need to be updated accordingly, which sounds like a very useful thing to do!

Now the problem is that as far as I know there is a significant lack of manpower when it comes to reviewing issues and pull requests for translators on Github...

adamsmith · July 11, 2024

CrossRef in particular should have decent ORCID coverage, but that's obviously only for fairly recent stuff. Publishers _have_ ORCID info, but it'll be a pain to get: it's in none of the metadata formats we typically use, so would require scraping (or getting from CrossRef). I don't even think it's in PubMed, but I could be wrong there. (Also, ORCID has metadata quality issues. Duplicate ORCIDs for individual authors are fairly common).
There are other identifiers to think about for older works (ISNI, WoS ID, possibly even VIAF records).

enozkan · July 11, 2024

I don't even think it's in PubMed, but I could be wrong there.

I was surprised to see that they are there. All my papers since 2020, and some going back to 2017, have them when I search the webpage with orcid[auid]. (The missing ones might be those we did not include ORCIDs when we submitted?)

They are also accessible using eutils - You can search with [auid] as part of the URL, and efetch.fcgi also returns full records like


<Author ValidYN="Y">
  <LastName>Last</LastName>
  <ForeName>First M</ForeName>
  <Initials>FM</Initials>
  <Identifier Source="ORCID">XXXX-XXXX-XXXX-XXXX</Identifier>
  <AffiliationInfo>
    <Affiliation>
      Department of Things, The University of All Things, New York, NY 10000, USA.
    </Affiliation>
  </AffiliationInfo>
</Author>

, whenever they are in the database.

So, this looks imminently possible for those in any biomedical field, from chem/phys bio to clinical.

adamsmith · July 11, 2024

ah thanks for checking. We're already using eutils for all pubmed import, so adding the ORCIDs from there would be just as trivial as for CrossRef and DataCite

MVittinghoff · July 12, 2024

A function to change multiple fields of the same type would be great not only for authorship corrections. Publisher, location and other details would also benefit.

sabine.dippel · October 7, 2024

I agree. I use this function very often for authors in another reference management programme (there are many variations of many names in all the databases). ORCID integration would be great, if possible.