Single Author with two different name spellings (short vs longhand)

tetsuc · August 27, 2008

I don't think this has been posted before, please redirect me if it has!

this is a question about "fixing" first names in the database

I have many articles that I have obtained from different sources and I am seeing a problem when I use the "insert citation" method in open office. The problem is not that the method doesn't work, but rather that if a single author's first name is different on two of my references (that are included in the same paper) then it is considered as two different authors and identifying first name information is added to the reference (I am using APA).

so when insert citations for each of these authors
Roberts, M
Roberts, Mybuddy

it shows up as (M Roberts, YEAR) and (Mybuddy Roberts, YEAR) when I would like them just to be (Roberts, YEAR). It is not an issue with the style, the issue is with the different first names in the database. How can I clean my database to equalise these names??? Or is there another way around this issue?

Also (separate question) , when I insert the bibliography (again, APA style) it includes the "DOI:" tags - I have never heard of this as a style requirement... is there some way to turn it off?

Thanks!!!

Tjowens · August 27, 2008

Second question first. DOIs are now apparently part of APA's approach, and generally a great idea to include with citations. I wish all styles would start to include unique identifiers like DOIs and ISBNs. For evidence see this from our forums, http://forums.zotero.org/discussion/130/?Focus=6175#Comment_6175 or if you like this http://ksulib.typepad.com/talking/2008/03/recent-changes.html from some helpful librarians.

As far as the first issue. Right now your best bet is to sort by author name, or use basic searches to find and manually edit the applicable author names. I don't really think there is much else that can be done. I suppose once duplicate detection is added it might be possible use the same basic functions to pick out authors it thinks might be the same people, at least in the easiest examples of authors that only have first initials.

tetsuc · August 27, 2008

Thank you for the quick response and thank you for the links, that was quite informative :-).

I was already starting to fix the names the way you have suggested but thought there might be a smarter solution. The real hassle comes in when the author is not the first author...

Cheers!

ps: Love this tool!

SRUnderwood · May 20, 2013

This is quite an old thread but I don't see it repeated and certainly no solution to the problem of duplicate author names. I too have problems with multiple versions of the same author, such as J.A. Bloggs, Jo A.Bloggs, Jo Bloggs, etc. It must waste space to record these as three separate people and I would expect a relational database to be able to list possible duplicate authors and allow consolidation or editing. The same applies to organisations (ESC or European Society of Cardiology) and even to journals (BMJ or British Medical Journal).

I am curious that this is not a more common suggestion. Is it not a need that anyone else has?

adamsmith · May 20, 2013

the recommendation is to save these all in the same - the most extensive - form (i.e Jo A. Bloggs). Currently you need fix that manually.
Batch editing to allow to fix this quickly is both frequently requested an planned.

arggem · May 20, 2013

the recommendation is to save these all in the same - the most extensive - form (i.e Jo A. Bloggs)

I've been wanting to address this for years, just never gotten around to it. I've always _assumed_ that you should cite the work the way it was published.

I have a guy (actually more than one) that publishes in 2 languages and sometimes using a more formal name, other times informal.

Charles E.
Charles
Chuck
Carlos
Carlitos

All the same guy.

So, is the recommendation that regardless of the language or how it's published, I should always cite him as Charles E. ?

aurimas · May 20, 2013

this whole issue could just magically disappear if we started using ORCID. Obviously easier said than done :-)

adamsmith · May 20, 2013

@arggem - it depends on the citation style afaik, but yes, it's more common to have to cite authors as they are in the original. But Zotero won't recognize them as the same author which causes the bibliography to fall apart. ORCID is probably the most promising solution to deal with that.

DWL-SDCA · May 20, 2013

over the past year I have talked with many journal editors and thesis supervisors about the name disambiguation issue. (We were getting feedback about attitudes to changing less complete author names and deduplicating all versions to the most complete name form. as stored in the SafetyLit database.) Even the few who were skeptics at first (when confronted with examples like those listed above and after it was demonstrated that article metadata often presents a different name form than the printed or pdf version) later agreed with standardizing the names. They were especially in favor when confronted with the problem of name disambiguation within text. If using the various published names is the standard then the disambiguation rules _must_ be relaxed else the result is greater ambiguity.

fbennett · May 20, 2013

If ORCID is used to standardize names across a database, it will make life simpler.

If it is used as an additional parameter (so that, say, whether "Run, C.J." and "Run, C. Jane" are distinguished depends on whether they share the same ORCID), the behaviour might be challenging to implement and to explain.

DWL-SDCA · May 20, 2013

To my dismay there is still no standard format for enterind author name variants in the ORCID record. I became very active in ORCID in the early days. I became far less interested when I was told that people of different cultures might become offended if standards for name entry were imposed. It was deemed inappropriate to have more than one field for names. Yet, at that time only plain ascii characters were allowed.
A name could be entered as
CHARLES Adam Baker
Charles, Adam Baker
Charles, A.B.
Charles, AB
Charles AB
Adam Baker Charles
Adam Charles
etc.

Similarly, there was only one field for entering journal volume, issue, and pagination and no guidance as to format for entery.

If I say more I will become emotional. I _really_ wanted ORCID to be a success.

aurimas · May 20, 2013

If it is used as an additional parameter (so that, say, whether "Run, C.J." and "Run, C. Jane" are distinguished depends on whether they share the same ORCID), the behaviour might be challenging to implement and to explain.

To solve the disambiguation issue, any identifier will do. ORCID just happens to be one that's already available. I don't expect the disambiguation to happen entirely based on ORCID. The implementation should stay as it is, except that it should first check to see if the ORCIDs are available for the two authors, and, if they are, whether the ORCIDs are the same. If one or both of the authors lack the ORCID, then everything remains as it is now. So, to me, the implementation on the processor level seems to be quite trivial.

As outlined above, I don't think Zotero should force all the names to be entered the same way in all records. It has to right now almost solely because of the disambiguation rules.

And I don't think it really matters for us (at this preliminary stage at least) how the names are entered in the ORCID database. If all we did was add an ORCID field and map it to CSL (and not do anything else with it) that on its own gives users a way to avoid unwanted disambiguation. They can enter the actual ORCID (that would be ideal) or they can add whatever other identifier they want to indicate that the authors are the same.

Looking forward, I expect that we will start importing ORCIDs automatically from various databases, which will alleviate the "burden" of manually entering ORCIDs. Of course we might also want to retroactively assign ORCIDs to the existing entries in the database, which I can see happening in a couple ways:

We re-fetch the metadata from the original source. Which (at least currently) is unlikely to contain ORCIDs.

We try to look up ORCIDs on the ORCID database. I don't think we can successfully match authors to their ORCIDs on their own due to what DWL-SDCA talks about and because there is often more than one researcher with the same name. I _do_ think that we can quite successfully map ORCIDs based on the entire publication. In which case, the spelling of the name as entered in to ORCID system should not matter very much.

I think this should be pretty high on the list for Zotero features. Being a leading open source reference manager and an advocate for standards and sanity, I think Zotero should be one of the first to adopt something as promising as ORCID.

fbennett · May 20, 2013

It won't be totally trivial, unfortunately.

If you suppress disambiguation on selected names that are distinct, that reduces the information available to the reader for identifying those particular author-date references in the bibliography, but not for others that may be mixed in with them. I suspect there will be unanticipated results that need to be worked through, if suppression is adopted.

dstillman · May 20, 2013

I don't know anything about ORCID, but at least for manual entry I've always imagined dealing with this by simply conceptualizing a unique author as an author with a particular set of items linked to them in the database, even if there are different name representations on those different items. This would be more work than having people just make up fake identifiers (if I'm understanding your suggestion, aurimas), but I don't particularly think people should need to do that. The client would need to make some sort of (hidden) identifier locally to allow for API-based syncing, which right now doesn't account for anything beyond names.

I could see doing this via a creators view, which we've long wanted anyway. You'd see creator names (including different variations) with items under them as child rows, and you'd be able to merge creators together to indicate they were the same person. Dragging an item out of a creator would create a separate creator object, even if the name was the same.

I could also envision an autocomplete drop-down that showed the linked items for a given creator, either for all name suggestions or when a suggestion had focus. (That wouldn't help if you were entering a new variation, but if we had a creators view we'd probably want a way to get to a given creator from the creator row in the item pane, so you could just go to the creators view and then merge it with other names from there.)

Of course, none of this would help for automatic importing, but that's where ORCID could perhaps come in.

aurimas · May 20, 2013

If you suppress disambiguation on selected names that are distinct, that reduces the information available to the reader for identifying those particular author-date references in the bibliography, but not for others that may be mixed in with them. I suspect there will be unanticipated results that need to be worked through, if suppression is adopted.

Sorry, I don't exactly follow where the problem lies here. Could you maybe give an example?

I don't know anything about ORCID

http://orcid.org/
It's essentially a globally unique identifier for a researcher. Just like DOI is for a paper.

This would be more work than having people just make up fake identifiers (if I'm understanding your suggestion, aurimas),

No, I wasn't really suggesting that this is what people should do, but if the field was editable, in theory, they could be lazy and do just that.

For manual entry, I think what you describe would be a better system than just manually entering the ORCID (i.e. when you type in the last name, you are given options to identify the author as another author in the database). Internally it could be a Zotero-specific ID, but I think that would be duplicating the purpose of ORCID. But obviously an ORCID would not always be available. We could use a pseudo-ORCID as a placeholder.

Anyway, I think the goal is to automate this as much as possible, so we should almost definitely utilize the existing ORCID database that identifies researchers in various papers as the same person.

fbennett · May 20, 2013

Sorry, I don't exactly follow where the problem lies here. Could you maybe give an example?

It really depends on what is expected from disambiguation (which has always been a little, ah, ambiguous). If the aim is for the same person always to show in the same form, the issue I had in mind would arise like this:

Bibliography (1st and 3rd entries are the same person)
Smith, J. (1999a) "Title A"
Smith, James (1999) "Title C"
Smith, John (1999b) "Title B"

In-text
(J. Smith 1999a)
(James Smith 1999)
(J. Smith 1999b)

If variance is allowed when there is a clash on the name of a separate person (the "Title C" author), year-suffixes would be less frequent, and would be clustered properly when they do occur:

Bibliography (1st and 3rd entries are the same person)
Smith, J. (1999) "Title A"
Smith, James (1999) "Title C"
Smith, John (1999) "Title B"

In-text
(J. Smith 1999a)
(James Smith 1999)
(John Smith 1999)

But of course, that's what we do already. If it's fine for the same author to be represented by different forms of the name in the second scenario, that does make one wonder whether forcing the initialed form only when the "Title C" reference is dropped in the example above is worth the effort.

Things will be much simpler, and cleaner in the output, if the role of ORCIDs is limited to unifying the form of individual author names before they arrive at the processor. If editors in the field have strong preferences that require ORCID logic in the processor, we can go there. But I would be reluctant to do it just for the sake of completeness.

(edited slightly for clarity)

aurimas · May 21, 2013

I don't think unifying author names is such a great idea. Often, depending on the source of the data, Zotero can only import initials. If you unify author names, you may be able to expand the initials of some of the authors but not others. This could result in a rather awkward list of authors - some with last name + initials, others with full names. It's not a huge deal, but for citation styles that use full names, this could also look odd in the bibliography. Though looking at your example and expanding it a bit, this may be necessary.

Bibliography (1st and 3rd entries are the same person; 2nd and 4th entries are also the same person)
Smith, J. (1999) "Title A"
Smith, J. (1999) "Title B"
Smith, James (1999) "Title C"
Smith, John (1999) "Title D"

In-text
(J. Smith 1999a)
(J. Smith 1999a) ?? This would probably call for a tertiary disambiguation?
(James Smith 1999b)
(John Smith 1999b)

If names are expanded then that would not be a problem.

fbennett · May 21, 2013

Authors sometimes prefer to see their name used in a particular form. The name of H.L.A. Hart would probably not be recognized readily by most readers if written out in full. There will be conventions for the representation of names in publishing, and I'm just suggesting that those should be our first point of reference (with respect to printed output).

In your example, I'm not sure what tertiary disambiguation means, but the suffixes wouldn't be set a-a b-b. A year-suffix disambiguates the citation alone; if it's not needed there, it won't be applied. So the James and John entries don't need a suffix at all. Conversely, if the same suffix is used on both of the "J. Smith" cites, they are still ambiguous, which wouldn't be right.

When used, a year-suffix needs to be applied to the author-date citation and to the matching bibliography entry. The sequence is normally bibliography sort order -- so "a" on the "Title A" entry, and "b" on the "Title B" entry. (If the entries are unaffected by the bib sort, the bib should leave them in document order.)

arggem · May 21, 2013

I wrote the APA style expert. Here is the response:

Always use the name as it appears in the published work.

If the surname remains the same and only the initials change, ignore the difference in initials and arrange the various works by date. The different forenames don’t really matter, as the reference uses only the initials (Chuck, Charles, Carlito, etc. all begin with C).

Smith, C. E. (2009)

Smith, C. (2010)

Smith, C. E. I. (2011)

If the author is switching surnames, you have a different problem.

Does that help with anything? Or is that what we know already? And the problem remains for Z to somehow know when they refer to one and the same author?

adamsmith · May 21, 2013

yeah, as I said early on:

@arggem - it depends on the citation style afaik, but yes, it's more common to have to cite authors as they are in the original.

but that currently just won't work properly with Zotero and has the types of issues that Frank and DWL raise above. In other words, it's a terrible idea, but one that citation styles commonly employ.

pigsonthewing · April 16, 2014

I've started a new discussion at [1], as a feature request.

I think the first stage should be to get an ORCID parameter for each author in each citation. Then the community can build tools that will capture ORCIDs and emit them; and users can manually (or semi-automatically) add them to citations in their libraries.

The act of disambiguating authors whose ORCIDs are not (yet) known can follow later.

[1] https://forums.zotero.org/discussion/36088/orcid/

DWL-SDCA · April 16, 2014

i've one more thing to add to this discussion.

As suggested above there are often many ways to represent an author's name. Although some styles require the author name to be presented as it was in the original; it isn't always clear what name form is the original -- especially for online journals. I've seen different name forms in the abstract and metadata, the full pdf version, and the full html version. This is especially true for authors who have several given names (to include how many of the names/initials).

Fixing this will require cooperation from authors, editors, publishers, name authorities (ISNI, ORCID, VIAF, etc.), and most importantly, the committees that make the rules that comprise the various citation standards. I should add eccentric professors and department heads to the list.

ajmorales17 · January 9, 2018

Has there been any progress on this issue in the latest Zotero version? Is there any (semi) automatic way to unify spelling of the same name authors? Thanks

adamsmith · January 9, 2018

nothing new, sorry.

DWL-SDCA · January 9, 2018

I am frustrated (and somewhat depressed) to report that the name authority situation not only has not improved but is deteriorating. This is completely out of the control of Zotero.

ORCID could have been a wonderful tool for name disambiguation but it will not be unless the organization makes major changes to its philosophy, guidelines, and database structure. When registering for an ORCID nothing other than a name (in any form) is required along with an email address. The email address isn't an available part of the ORCID record but is only for verification. When searching for someone with a common name, a query can return many identical names with no way to guess which is which. ORCID doesn't require any information on institutional affiliation, location, birth year, etc. ORCID allows a submitter to record this information and _allows_ an author to list their publications but, without information other than a name, it is all but impossible to determine one person's similar name from another. The last time I closely examined this I found that the vast majority of ORCID records contain only a simple mame. Further, because it is so easy to obtain an ID many people seem to obtain a new one whenever they submit a manuscript to a publisher that requires one. Indeed, it is quicker and easier to obtain a new ORCID ID than it is to find one's existing ID if not remembered. I know of authors who have three ORCID IDs each with only a name attached. I have evidence of authors who have several IDs. [See also the problems I mentioned in an earlier post to this thread.]
Edit--
My organization was an ORCID launch partner and I had the highest hopes for this project. I am willing, even eager to work again with ORCID to resolve these problems.

bwiernik · January 9, 2018

Even worse regarding duplicate ORCID IDs, I know several researchers who are paranoid about organizations compiling lists of their research products, so they intentionally make a new ID with each submission in an attempt to prevent data gathering.

naught101 · April 5, 2018

I don't think Zotero should be relying on ORCID regardless of how well it works. Zotero needs its own concept of "author" as a first class object that citations point to, and then that author object could be linked to one (or more) ORCIDs, as well as any other standard-ish author ID services (ResearcherID, Google scholar author profiles, ResearchGate URLs, etc).

SOAS C13 · September 26, 2018

Sooner or later, a proper method of authority control should be added as an optional field to any reference management software. The International Standard Name Identifier (ISNI; http://www.isni.org/) has now been adopted by major libraries such as the British Library, Bibliothèque nationale de France and many more. ORCID is focused on the identification of researchers and won't be a suitable tool for authority control. I would strongly suggest that ISNI is introduced as an optional name-attribute for Authors/Editors/Contributors/Translators in the next version of Zotero.

david_lindemann · October 4, 2018

Thanks for this discussion (ORCID vs. ISNI). We have a strong need for being able to associate ORCID and/or ISNI to Zotero authors/editors/contributors/translators. Is the Zotero dev team planning to implement this?

webbz01 · January 14, 2019

Having manually changed all my instances of
Smith, J. A.
and
Smith, John
to
Smith, John A.
...
the in-text citations in my pre-existing Word document now all read (John A. Smith & Other, 2018) but should of course read (Smith & Other, 2018)... as there's no other John Smith cited in the document, or in my database.

Any tips for fixing this?

adamsmith · January 14, 2019

Start by looking at the bibliography, that might provide a clue for the one author version that's still not quite right (e.g. John A instead of John A.)