Single Author with two different name spellings (short vs longhand)
I don't think this has been posted before, please redirect me if it has!
this is a question about "fixing" first names in the database
I have many articles that I have obtained from different sources and I am seeing a problem when I use the "insert citation" method in open office. The problem is not that the method doesn't work, but rather that if a single author's first name is different on two of my references (that are included in the same paper) then it is considered as two different authors and identifying first name information is added to the reference (I am using APA).
so when insert citations for each of these authors
Roberts, M
Roberts, Mybuddy
it shows up as (M Roberts, YEAR) and (Mybuddy Roberts, YEAR) when I would like them just to be (Roberts, YEAR). It is not an issue with the style, the issue is with the different first names in the database. How can I clean my database to equalise these names??? Or is there another way around this issue?
Also (separate question) , when I insert the bibliography (again, APA style) it includes the "DOI:" tags - I have never heard of this as a style requirement... is there some way to turn it off?
Thanks!!!
this is a question about "fixing" first names in the database
I have many articles that I have obtained from different sources and I am seeing a problem when I use the "insert citation" method in open office. The problem is not that the method doesn't work, but rather that if a single author's first name is different on two of my references (that are included in the same paper) then it is considered as two different authors and identifying first name information is added to the reference (I am using APA).
so when insert citations for each of these authors
Roberts, M
Roberts, Mybuddy
it shows up as (M Roberts, YEAR) and (Mybuddy Roberts, YEAR) when I would like them just to be (Roberts, YEAR). It is not an issue with the style, the issue is with the different first names in the database. How can I clean my database to equalise these names??? Or is there another way around this issue?
Also (separate question) , when I insert the bibliography (again, APA style) it includes the "DOI:" tags - I have never heard of this as a style requirement... is there some way to turn it off?
Thanks!!!
As far as the first issue. Right now your best bet is to sort by author name, or use basic searches to find and manually edit the applicable author names. I don't really think there is much else that can be done. I suppose once duplicate detection is added it might be possible use the same basic functions to pick out authors it thinks might be the same people, at least in the easiest examples of authors that only have first initials.
I was already starting to fix the names the way you have suggested but thought there might be a smarter solution. The real hassle comes in when the author is not the first author...
Cheers!
ps: Love this tool!
I am curious that this is not a more common suggestion. Is it not a need that anyone else has?
Batch editing to allow to fix this quickly is both frequently requested an planned.
I have a guy (actually more than one) that publishes in 2 languages and sometimes using a more formal name, other times informal.
Charles E.
Charles
Chuck
Carlos
Carlitos
All the same guy.
So, is the recommendation that regardless of the language or how it's published, I should always cite him as Charles E. ?
If it is used as an additional parameter (so that, say, whether "Run, C.J." and "Run, C. Jane" are distinguished depends on whether they share the same ORCID), the behaviour might be challenging to implement and to explain.
A name could be entered as
CHARLES Adam Baker
Charles, Adam Baker
Charles, A.B.
Charles, AB
Charles AB
Adam Baker Charles
Adam Charles
etc.
Similarly, there was only one field for entering journal volume, issue, and pagination and no guidance as to format for entery.
If I say more I will become emotional. I _really_ wanted ORCID to be a success.
As outlined above, I don't think Zotero should force all the names to be entered the same way in all records. It has to right now almost solely because of the disambiguation rules.
And I don't think it really matters for us (at this preliminary stage at least) how the names are entered in the ORCID database. If all we did was add an ORCID field and map it to CSL (and not do anything else with it) that on its own gives users a way to avoid unwanted disambiguation. They can enter the actual ORCID (that would be ideal) or they can add whatever other identifier they want to indicate that the authors are the same.
Looking forward, I expect that we will start importing ORCIDs automatically from various databases, which will alleviate the "burden" of manually entering ORCIDs. Of course we might also want to retroactively assign ORCIDs to the existing entries in the database, which I can see happening in a couple ways:
- We re-fetch the metadata from the original source. Which (at least currently) is unlikely to contain ORCIDs.
- We try to look up ORCIDs on the ORCID database. I don't think we can successfully match authors to their ORCIDs on their own due to what DWL-SDCA talks about and because there is often more than one researcher with the same name. I _do_ think that we can quite successfully map ORCIDs based on the entire publication. In which case, the spelling of the name as entered in to ORCID system should not matter very much.
I think this should be pretty high on the list for Zotero features. Being a leading open source reference manager and an advocate for standards and sanity, I think Zotero should be one of the first to adopt something as promising as ORCID.If you suppress disambiguation on selected names that are distinct, that reduces the information available to the reader for identifying those particular author-date references in the bibliography, but not for others that may be mixed in with them. I suspect there will be unanticipated results that need to be worked through, if suppression is adopted.
I could see doing this via a creators view, which we've long wanted anyway. You'd see creator names (including different variations) with items under them as child rows, and you'd be able to merge creators together to indicate they were the same person. Dragging an item out of a creator would create a separate creator object, even if the name was the same.
I could also envision an autocomplete drop-down that showed the linked items for a given creator, either for all name suggestions or when a suggestion had focus. (That wouldn't help if you were entering a new variation, but if we had a creators view we'd probably want a way to get to a given creator from the creator row in the item pane, so you could just go to the creators view and then merge it with other names from there.)
Of course, none of this would help for automatic importing, but that's where ORCID could perhaps come in.
It's essentially a globally unique identifier for a researcher. Just like DOI is for a paper. No, I wasn't really suggesting that this is what people should do, but if the field was editable, in theory, they could be lazy and do just that.
For manual entry, I think what you describe would be a better system than just manually entering the ORCID (i.e. when you type in the last name, you are given options to identify the author as another author in the database). Internally it could be a Zotero-specific ID, but I think that would be duplicating the purpose of ORCID. But obviously an ORCID would not always be available. We could use a pseudo-ORCID as a placeholder.
Anyway, I think the goal is to automate this as much as possible, so we should almost definitely utilize the existing ORCID database that identifies researchers in various papers as the same person.
Bibliography (1st and 3rd entries are the same person)
Smith, J. (1999a) "Title A"
Smith, James (1999) "Title C"
Smith, John (1999b) "Title B"
In-text
(J. Smith 1999a)
(James Smith 1999)
(J. Smith 1999b)
If variance is allowed when there is a clash on the name of a separate person (the "Title C" author), year-suffixes would be less frequent, and would be clustered properly when they do occur:
Bibliography (1st and 3rd entries are the same person)
Smith, J. (1999) "Title A"
Smith, James (1999) "Title C"
Smith, John (1999) "Title B"
In-text
(J. Smith 1999a)
(James Smith 1999)
(John Smith 1999)
But of course, that's what we do already. If it's fine for the same author to be represented by different forms of the name in the second scenario, that does make one wonder whether forcing the initialed form only when the "Title C" reference is dropped in the example above is worth the effort.
Things will be much simpler, and cleaner in the output, if the role of ORCIDs is limited to unifying the form of individual author names before they arrive at the processor. If editors in the field have strong preferences that require ORCID logic in the processor, we can go there. But I would be reluctant to do it just for the sake of completeness.
(edited slightly for clarity)
Bibliography (1st and 3rd entries are the same person; 2nd and 4th entries are also the same person)
Smith, J. (1999) "Title A"
Smith, J. (1999) "Title B"
Smith, James (1999) "Title C"
Smith, John (1999) "Title D"
In-text
(J. Smith 1999a)
(J. Smith 1999a) ?? This would probably call for a tertiary disambiguation?
(James Smith 1999b)
(John Smith 1999b)
If names are expanded then that would not be a problem.
In your example, I'm not sure what tertiary disambiguation means, but the suffixes wouldn't be set a-a b-b. A year-suffix disambiguates the citation alone; if it's not needed there, it won't be applied. So the James and John entries don't need a suffix at all. Conversely, if the same suffix is used on both of the "J. Smith" cites, they are still ambiguous, which wouldn't be right.
When used, a year-suffix needs to be applied to the author-date citation and to the matching bibliography entry. The sequence is normally bibliography sort order -- so "a" on the "Title A" entry, and "b" on the "Title B" entry. (If the entries are unaffected by the bib sort, the bib should leave them in document order.)
I think the first stage should be to get an ORCID parameter for each author in each citation. Then the community can build tools that will capture ORCIDs and emit them; and users can manually (or semi-automatically) add them to citations in their libraries.
The act of disambiguating authors whose ORCIDs are not (yet) known can follow later.
[1] https://forums.zotero.org/discussion/36088/orcid/
As suggested above there are often many ways to represent an author's name. Although some styles require the author name to be presented as it was in the original; it isn't always clear what name form is the original -- especially for online journals. I've seen different name forms in the abstract and metadata, the full pdf version, and the full html version. This is especially true for authors who have several given names (to include how many of the names/initials).
Fixing this will require cooperation from authors, editors, publishers, name authorities (ISNI, ORCID, VIAF, etc.), and most importantly, the committees that make the rules that comprise the various citation standards. I should add eccentric professors and department heads to the list.
ORCID could have been a wonderful tool for name disambiguation but it will not be unless the organization makes major changes to its philosophy, guidelines, and database structure. When registering for an ORCID nothing other than a name (in any form) is required along with an email address. The email address isn't an available part of the ORCID record but is only for verification. When searching for someone with a common name, a query can return many identical names with no way to guess which is which. ORCID doesn't require any information on institutional affiliation, location, birth year, etc. ORCID allows a submitter to record this information and _allows_ an author to list their publications but, without information other than a name, it is all but impossible to determine one person's similar name from another. The last time I closely examined this I found that the vast majority of ORCID records contain only a simple mame. Further, because it is so easy to obtain an ID many people seem to obtain a new one whenever they submit a manuscript to a publisher that requires one. Indeed, it is quicker and easier to obtain a new ORCID ID than it is to find one's existing ID if not remembered. I know of authors who have three ORCID IDs each with only a name attached. I have evidence of authors who have several IDs. [See also the problems I mentioned in an earlier post to this thread.]
Edit--
My organization was an ORCID launch partner and I had the highest hopes for this project. I am willing, even eager to work again with ORCID to resolve these problems.
Smith, J. A.
and
Smith, John
to
Smith, John A.
...
the in-text citations in my pre-existing Word document now all read (John A. Smith & Other, 2018) but should of course read (Smith & Other, 2018)... as there's no other John Smith cited in the document, or in my database.
Any tips for fixing this?