ORCID

An ORCID identifier [1] is like an ISBN or DOI for a researcher, or other author. Each identifier is a URI, whose right-hand part is a unique, 16 digit string, separated into four groups of four by hyphens for example, mine is http://orcid.org/0000-0001-5882-6823 The ORCID website [2] has technical details, such as the range and check digit system.

ORCID is an open project, run by a not-for-profit foundation.

Zotero should include an ORCID field for each author in a citation. Once that's available, scrapers can capture ORCIDs when pulling in citation metadata from websites or other sources, and include them in their various output formats. A user could also manually (or in a semi-automated process) add an ORCID to a citation, where they are confident of the author's identity. And, of course, a scraper should be provided that can fetch citation metadata from am author's ORCID profile page (I've asked the ORCID team to add semantic markup such as COinS, a microformat, or microdata to their pages).

Within the ORCID library, ORCID identifiers should link to the subject's profile on the ORCID website (such as mine, above).

Anyone may register, free, for an ORCID identifier, at the ORCID website [2]. It takes less than a minute to do so, and I would encourage you all to get one. Indeed, the Zotero forums should include an ORCID parameter in user profiles.

There was some discussion of ORCID here previously [3] but it went off at a tangent; I'll comment there shortly, linking to this discussion.

[1] ​https://en.wikipedia.org/wiki/ORCID

[2] http://orcid.org/

[3] https://forums.zotero.org/discussion/3913/single-author-with-two-different-name-spellings-short-vs-longhand
«1
  • Like I was saying on Twitter, this needs to mostly work automatically (very few users are going to fish around for ORCIDs). Zotero needs to be able to fetch this info directly from ORCID or scrape it along with other metadata from the page (I don't think I've seen a publisher that provides this metadata yet, so it would pretty much have to be option 1).

    ORCID does have an API, but the API is not sufficient for Zotero's use case. What Zotero needs to be able to do is query ORCID with a unique identifier for a journal article (DOI) and be able to get an ordered list of authors with their associated ORCIDs. I don't see a way to do this currently.

    Adding an ORCID (or similar identifier) field would, at best, come with 4.2, but we need to see if this will be at all usable. This includes deciding how disambiguation is supposed to be handled as pointed out in the threat that "went off on a tangent".

    Would love to see this happen though.
  • [Also continued from Twitter; I'm @pigsonthewing there]

    The ORCID system/ API is still being built.

    Publishers are starting to include ORCIDs, both on-page [1],[2],[3] and in metadata [4]. We're also including them in Wikipedia articles [5] and Wikidata entries [6], and working on including them in citations on Wikipedia.

    [1] http://www.hindawi.com/80647648/

    [2] http://www.nature.com/ni/journal/v14/n7/full/ni.2633.html (click on name of "Chen Dong" to see pop-up profile)

    [3] http://www.plantcell.org/content/early/2014/03/17/tpc.113.121830 (on PDF)

    [4] http://www.ncbi.nlm.nih.gov/pubmed/24592396?report=xml&format=text

    [5] https://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_ORCID_identifiers

    [6] https://www.wikidata.org/wiki/Special:WhatLinksHere/Property:P496
  • Also, I do not see why you would expect, for a given DOI, to get the author'(s') names and identifiers /from/ the ORCID API; surely you should get these from the publisher's page/ database.

    Once you have an author's ORCID, then the ORCID API will tell you who they are (their website, other identifiers, etc.) and what else they have published, as well as other names/ name variants they have used.
  • Also, I do not see why you would expect, for a given DOI, to get the author'(s') names and identifiers /from/ the ORCID API
    Mostly because ORCID is in the business of identifying which authors have what publications out and in sciences figuring out if you are a first or third author on a particular publication matters. Since ORCID has taken it up on themselves to assign existing papers to authors, I would also expect them to identify which of the ambiguous authors the person is.

    One note on the API (this is the example from the documentation)
    http://pub.orcid.org/v1.1/search/orcid-bio/?q=digital-object-ids:"10.1087/20120404" returns authors for http://www.ingentaconnect.com/content/alpsp/lp/2012/00000025/00000004/art00004?token=004b14c36a64405847447b496e2f2a3125763b6b634c7633757e6f3f2f2730673f582f6bf2f but it returns "Paglione, Laura" twice with two different ORCIDs. I wonder if this is a common issue and how we're supposed to deal with it. I'll contact ORCID, but just wanted to note it here.
  • edited April 16, 2014
    ORCID is a great idea but it isn't yet ready for use with Zotero -- much less setting up a 2-way system through the ORCID API. Maybe in a year we should revisit this.

    ORCID currently has no standard for entry of author name variants. It has no standard for entering publication metadata. Those of us who have been active from the beginning (my SafetyLit bibliographic database was a "launch partner") have been offering to volunteer to help with name and article metadata structures; however my offer to help as an expert in cataloging and bibliographic database programming was not accepted. I know of several others who offered to help even to spend their own money to travel to meetings. These offers were rejected or not even acknowledged.

    When an author tries to use the semi-automatic systems to import their works into the ORCID system, the result is entries with author names and publication names in multiple forms; multiply duplicated entries, and different formats for abbreviations, page ranges, publication dates, etc.

    Although the publication records can be deleted, there isn't a system that allows the records to be edited. One can hand-enter publications but the data entry form is nearly impossible to use.

    The search system is crazy bad. Entering my own name as firstname middle name last name shows many possibilities who have a first name the same as my last name and a last name the same as my first name. A similar problem occurs when I enter my name in the form lastname, firstname.

    The system allows users to import works from CrossRef but the way the ORCID system queries the CR system finds hundreds of partial matches (including books with titles that include any of the author's names) but omits some publications that are in the CR system.

    This has become a rant. I apologize. One day, ORCID or some similar project will make it easier to do our bibliographic work.
  • ORCID is a great idea but it isn't yet ready for use with Zotero
    That's a bold statement; it doesn't seem to be supported by anything in your post.

    I have already explained how, today, Zotero could capture, store and emit ORCID identifiers along with other bibliographic metadata. Adding the parameter now will facilitate various use cases, and tools to deliver them, and will provide future-proofing for the later addition of more advanced functionality.
    my offer to help as an expert in cataloging and bibliographic database programming was not accepted
    I sat as a volunteer on ORCID's 'works metadata working group'; I found it collegial and effective, and I understand that ORCID are busy working to implement its recommendations (which document is cited in my ORCID profile), including those which address many of the issues you raise. It is also possible to raise issues via ORCID's feedback system [1]; I have done so; and have received feedback indicating that work to implement some of my suggestions is in hand.

    [1] http://support.orcid.org/forums/175591-general
  • That's a bold statement; it doesn't seem to be supported by anything in your post.

    I have already explained how, today, Zotero could capture, store and emit ORCID identifiers along with other bibliographic metadata.
    I'm not sure how else to explain this. If you can explain to me exactly how I could assign the correct ORCID (assuming that ORCIDs for all authors on that list are available) to the second author "Y Zhang" in this paper, I will gladly retract my statement.
  • I just want to be clear that this is, in any case, still a while out, there is no need to argue as if the final word on implementation will be spoken tomorrow. I doubt 4.2 is going to come out in the next 6 months, considering that 4.1 will require an extensive beta.
    With API syncing in place, future changes to the data model will be easier, so even if they don't make it into 4.2 they can be added later.

    The biggest concern from my side would be that in order to make this worth the dev time as well as GUI space and complication, there needs to be a tangible benefit to a broad range of users. I have my doubts about how important ORCIDs are - at least in their current state - for Zotero's core functionality (i.e. reference management), but given yesterday's announcement that Zotero will strive towards integrating better with paper repositories there's probably more of an immediate need. I'd expect that to come up in the Penn State collaboration.
  • I'm not sure how else to explain this. If you can explain to me exactly how I could assign the correct ORCID (assuming that ORCIDs for all authors on that list are available) to the second author "Y Zhang" in this paper, I will gladly retract my statement.
    Your meaning is perfectly clear; and you're correct that that ORCID can't give you a UID for that ambiguous author. The problem with your scenario is that ORCID doesn't pretend to be able to do so. [This does not validate the claim that "ORCID isn't yet ready for use with Zotero"; nor does it invalidate my statement that "today, Zotero could capture, store and emit ORCID identifiers along with other bibliographic metadata".]

    If your assumption that each of the paper's authors has an ORCID identifier is valid, then the publisher should include those identifiers in the paper (and ideally in the online metadata), and that is where you should obtain it. ORCID will then tell you who that author is, disambiguate them from others with the same name, and list their other publications.

    The statement which I called bold, and quoted, was not by you, BTW.
  • The statement which I called bold, and quoted, was not by you, BTW.
    oops. I think I made the same statement on Twitter though.
    Your meaning is perfectly clear; and you're correct that that ORCID can't give you a UID for that ambiguous author. The problem with your scenario is that ORCID doesn't pretend to be able to do so.
    OK, so then it seems we're in agreement that ORCID API cannot be used reliably for this purpose.
    If your assumption that each of the paper's authors has an ORCID identifier is valid, then the publisher should include those identifiers in the paper (and ideally in the online metadata), and that is where you should obtain it.
    Would love to see this happen (even if not all authors are assigned an ORCID). Seems like we can start thinking about doing this for PubMed. For Zotero purposes, this should really come along with the metadata. Extracting it from a popup (as per Nature example) is possible, but way too much hassle at this point for marginal reward. I suppose we'll just have to wait and see how well publisher adopt this.
    ORCID will then tell you who that author is, disambiguate them from others with the same name, and list their other publications.
    I don't see this being integrated into Zotero any time soon. ORCID itself is sufficient to disambiguate authors, Zotero, as a citation manager, doesn't care much for the author's bio, and, I think, listing other publications would probably be left to ORCID website (via a link of some sort).

    So basically, at this point, this is pending wider use of ORCID in metadata and, of course, Zotero 4.2.

    P.S. with a proper ORCID API that would do what I was talking about above, retroactively assigning ORCIDs to existing items in users' libraries (once ORCIDs are supported) would make this a whole lot easier. Given that most publishers will adopt ORCIDs in metadata, our only other option would be essentially to re-fetch metadata from publisher web pages (this is planned in general, but may take even longer than 4.2 to accomplish).
  • edited April 17, 2014
    It seems we're in agreement that ORCID API cannot be used reliably for this purpose
    It's not part of the design. That doesn't mean you can't raise a ticket on the ORCID feedback forum.
    Extracting it from a popup (as per Nature example) is possible, but way too much hassle at this point for marginal reward.
    It's not he ideal way of presenting it, but may change, and a volunteer may wish to create a translator that reads it into Zotero - providing Zotero has a parameter in which to store it.
    listing other publications would probably be left to ORCID website (via a link of some sort)
    Yes, that's what I suggest. I envisage a scenario where, say, a student using Zotero will read a paper, then drag the metadata into their Zotero library. A week later, when they come to cite the work, they will open Zotero and see a link to the author's ORCID profile. Clicking on that, they will discover on the author's ORCID profile other papers by the same author, which then assist their studies further, and whose metadata they can drag into Zotero one by one, or as a set. Of course, when they cite one of these work in theirs, they will want to include the author's ORCID identifier in that citation.
  • I think we can all agree that in principle, ORCIDs are a great idea. Author's websites often have only selected publications, and sometimes no DOIs. Sometimes the author has no online publications list, and it can take forever to be moderately certain of the fact. And academics change their names, or have common names like Zhang.

    The problem is getting ORCID data.

    We are collecting the metadata of everything with a DOI on WikiData (the database sidekick of Wikipedia). The database already has some ORCIDs, and as these become common it will incrementally incorperate them. Getting ORCIDs from Wikidata to a user's Zotero collection would be totally automated, and would not rely on the publisher offering ORCID metadata on their article pages (although it would be nice if they all did; perhaps Wikidata metadata would help them insert it).

    Details:
    https://forums.zotero.org/discussion/36151/wikified-copyleft-bibliographic-database/

    Zotero could start offering an ORCID field now, as long as it is not intrusive for papers that lack ORCIDs. I don't think that there is much doubt that ORCIDs will become widespread, so it's a question of implementing now or later. Is there any advantage to waiting?
  • edited July 27, 2014
    We'll have to wait at least until Dan is ready to make database upgrades, which is probably going to happen early next year (though I'm just guessing). This is great though. Do you have any way to assign ORCID to a specific author in the list if they are ambiguous?

    Edit: We could probably start rolling out a non-syncing alpha earlier though
  • Manual editing of Wikidata for that article? I don't know of any way of automatically disambiguating authors. I guess you could do a semi-automated search for pages containing both the article metadata (especially DOI) and an ORCID, or a most-probable-match based on subject or coauthorships on other papers that include ORCIDs. I think you'd need to manually verify, though. Is this what you were asking, or have I got the wrong end of the wrong stick?

    I suspect that (at least until the publishers start using ORCIDs systematically) the easiest way is for academics and their co-workers to add (and correct) their own ORCID data, ideally through a user-friendly interface that they already use, like Zotero.
  • I mean if you have "Smith J., Doe K., Smith J., ...." the current ORCID system does not have a way of indicating which "Smith J." is which person (by ORCID), it just knows that both Smiths are on the paper (best case, though it's likely that it only knows an ORCID for one of them). In either case, it's not clear who the first author is. Does this database intend to provide a way to distinguish the two authors?
  • edited July 27, 2014
    Following on from aurimas, my understanding is that ORCID gives authors a way to list their publications through a standard channel accessed via their ORCID. This will be useful when fine-grained linking starts finding its way into citations (via RDFa or microformats or whatever), but it doesn't solve our problems with disambiguation of author names for citation purposes.

    With ambiguous Asian names (Zheng, Tanaka, etc.), some scientific publishers have recently adopted the sensible practice of appending the full author name in the original script, in parentheses after the romanized form. Unfortunately, since ORCID records do not provide language variants of an author's name (and seldom provide it in the original script), we can't draw on them for this use case, even if the author's ID is known.

    ORCID is a great thing for altmetrics work, and it helps authors to hold out their full publication history to a broader community under their own ID. The flip-side of that appeal is that the content in ORCID records tends to be "helpfully" translated into English, regardless of the language of the underlying resource.

    I could be wrong, but I don't particularly see ORCID as a game-changer for reference managers and citation formatting tools. It is a very meaningful piece of metadata, though, and the idea of typing Zotero in as a means of crowd-sourcing improvements to related records is interesting.

    (In a broader rant, the multilingual metadata provided by cite aggregation services like CiNII [Japan] and CNKI [China] is often of ghastly poor quality, and it would be great to a community effort to clean up those messes.)

    ORCID is obviously very important and meaningful metadata, and when the dev cycle touches the schema again, I'm sure we'll see provision made for it in Zotero records.
  • Just following up in this thread - would be interested in seeing ORCID support in Zotero (even if it's manual!).
  • I'm not quite sure how ORCID integration would look in a way that actually adds value to Zotero. It's never used in citations, it's very rarely available in metadata, it's coverage is more than spotty -- I like ORCID, but I just don't see a compelling case for Zotero to invest development and screen real estate on it, certainly at the current time.
  • edited March 6, 2019
    The primary use seems to be as a tool to identify and possibly edit author names that have been added to a Zotero library in multiple ways. See my recent comment in the other currennt ORCID thread for why this use is not likely to be sucessful.

    https://forums.zotero.org/discussion/3913/single-author-with-two-different-name-spellings-short-vs-longhand#latest
  • Sure - that's all fair. However, if there are two authors with the same name in the same library, there isn't really a way to distinguish them. I suppose one could put something in brackets after the first name.

    Also see new thread on affiliations here: https://forums.zotero.org/discussion/76274/authors-and-affiliations

  • edited March 6, 2019
    To belatedly answer Aurimas' question of July 27, 2014 (sorry, Aurimas), the Wikidata database does indeed distinguish two "Smith, J."s on the same paper.

    Each author is a database item, which contains all available public information about J. Smith, including things like their ORCID, where they work, where they got their PhD, their birth and death dates, and so forth. If you wanted to write a query asking for the names of all the people whose spouses were born in the 1700s in the same city as one of the grad students of one of the authors of the paper, you could. More usefully, you can search all papers by the same author since 2010, or all papers which share two authors with the current paper, etc.

    For example, see
    https://www.wikidata.org/wiki/Q56855591
    That unique identifier, Q56855591, represents Donna Strickland, and links her to insanely detailed information on what her names are, the URL of her official website, who her doctoral advisor was, her fields of work, her past and present professional positions, and what IDs, including ORCIDs, are associated with her.

    The Wikidata bibliographic database is now utterly huge, and has a good API. I would estimate that Zotero software has been used to build most of it. If Zotero included a download/upload feature, so that Zotero could interface with Wikidata and Zotero users could upload disambiguated and corrected bibliographic information, I think it would be useful to all parties. I hate proofreading bibliographies and would love to crowdshare the process, and it would provide what is probably the only way to disambiguate two "Smith, J."s.

    Academics are also involved in proofreading their own data. There's also an #icanhazwikidata campaign; the idea is that academics tweet it to request the addition of their paper etc. to the database.
  • Further to HLHJ's post; see:

    https://www.wikidata.org/wiki/Wikidata:Zotero

    (disclosure: I'm that page's main author)
  • A light touch approach could be offering a field for ORCID inclusion in Zotero's profile which could help with resolving authors, but ORCID isn't really a stand alone solution for the history of articles at this time since ORCID is for those who created their own ORCID. This would only solve now and future identification and does not help with the authors who passed away without creating an ORCID or didn't delegate management to an organization for their record.

    See the ORCID support topic: Is it possible to register an ORCID iD for a deceased person? - "No. Our policy is that an ORCID iD can only be created by the individual themselves, not by any other person. This is because a core principle of ORCID is individual control." Source: https://support.orcid.org/hc/en-us/articles/360024829193-How-are-ORCID-records-for-deceased-people-handled- retrieved: 2019-11-19

    Being someone who is working on resolving some paper authors to ORCID and Wikidata there are definite gaps where publishers have not (yet?) applied an ORCID and where an ORCID could not be included with a publication and people may never link their ORCID, but integrating Zotero with Wikidata and ORCID makes sense to facilitate with entity resolution across authors and further linked open data tools. Combining an ORCID, VIAF, Wikidata Q and other authority identifiers
    Other Identifiers such as VIAF (https://viaf.org/) and others applied by Libraries, Archives, etc could be a way to approach reliability of resolved entities since there is momentum around linked open data.

    WikiCite (http://wikicite.org/) is working on some of this focused on Wikipedia citations, but others on are curating specific topics for research. Wikicite is referenced in the link pigsonthewing provided above and an ORCID is a key identifier that helps with the resolution to a Q in Wikidata and can enhance the resolved linked open data.

    Tooling for papers and author disambiguation in Wikicite and used in Wikidata has improved for such resolution of author entities.

    While coverages is not perfect Zotero can make it easier to curate this and import resolved content, the effort to curate this is in progress by various parties with support by Crossref and more.

    Tools for disambiguation:
    https://tools.wmflabs.org/author-disambiguator/work_item.php?id=Q56705592&doit=Get+author+links+for+work

    Finding gaps in resolved authors to papers: https://tools.wmflabs.org/scholia/topic/Q1093434/missing

    Integrating Zotero profiles with ORCIDs and an associated Wikidata Q could enable distributing the work of verifying accurate resolution. It would allow nearly effortless extended generated profiles based on Wikidata curated information in linked open data which is open to edit and monitor.

    Enables:
    Generated presentations of Authors: https://tools.wmflabs.org/reasonator/?&q=57978392
    Generated Topic Summaries: https://tools.wmflabs.org/scholia/topic/Q1093434

    Other tools - imports/sync to ORCID and DOI:
    https://tools.wmflabs.org/sourcemd/ (generates Quickstatements like the current Zotero work)
    Author presentation: https://tools.wmflabs.org/scholia/author/Q57978392
    Work presentation: https://tools.wmflabs.org/scholia/work/Q56705592

    Related curation and entity resolution/modelling work by Library professions integrating linked open data with Wikidata: https://wiki.lyrasis.org/display/LD4P2/LD4-Wikidata+Affinity+Group

    To sum this up, ORCID is nice, but I think there should be a broader discussion on how to integrate with the linked open data ecosystem for a more wholistic view of how to integrate with existing work being done especially since ORCID is only a partial solution for resolving authors and at least two IDs are needed to tackle living and historical authors. Encouraging ORCID adoption by those living authors though is a positive move to help facilitate curation of the papers long term.
  • If we ever get authority IDs, and if it's decided that there can be only one, I like the idea of Wikidata QIDs as an alternative to ORCIDs. There are a whole host of citable items (music recordings, blogs, legal opinions, newspaper articles, etc) for which the authors are unlikely to ever have an ORCID, but for which it is entirely possible a Wikidata QID will exist.

    Wikidata QIDs are often seen as a linking hub that binds together ORCID, VIAF, and other canonical identifiers like Twitter handles (heh).

    On the library interest front, see also:

    Allison-Cassin, S., & Scott., D. (2018). Wikidata: A platform for your library’s linked open data. The Code4Lib Journal, (40). Retrieved from https://journal.code4lib.org/articles/13424 [1]

    ARL Task Force on Wikimedia and Linked Open Data. (2019). ARL White Paper on Wikidata. https://www.arl.org/resources/arl-whitepaper-on-wikidata/

    1. Confession: I was a co-author of this article.
  • I've been trying to clean up a large database recently, have some thoughts on this topic, and would like to propose a way forward. A few realities must be recognized: 1. There is no way to identify a person uniquely. 2. Some citers like to maintain the original name from a publication (even if it is incomplete or wrong).

    Within the Zotero database, creators are in a table that keeps a unique ID and a first and last name (just two text fields). I propose that a table be added to the database that is "creatorIDs", with three columns - the internal id from the creator table, a text string that is a unique ID of some kind, and a schema or "translator" for that ID. The combination of the first two would be the unique key for the table. This structure would allow any identifiers to be added to a creator, with multiple supposedly unique identifiers per person. The meaning of the schema would depend on software support, and the Zotero software could choose to enforce a uniqueness constraint on the creator id and the schema (so only allow one ID for various schema). The identifiers could be ORCIDs, Wikidata QIDs, email addresses, ReasercherIDs, Social security numbers ;), or whatever. One option should be the Zotero user id from the API. They should be correctly associated with the person whose name is in the creator table (but that has to be done manually).

    Within Zotero (desktop or online), a new tree root could be created for creators, which would list all the people in the database. A leaf of that tree would be "Duplicate names," which would propose names for merging. Options could control the merge criteria, but the default could be matching last name and at least one matching unique ID. In the "creator" display, the user could also select two or more names and merge them, much like the current merging of records. If a user chooses not to merge creators, that is their choice - the names would continue to be used as entered/scraped/imported. If they merge them, the user will choose which names to use (probably the longest), and the other unique IDs associated with the creator id will be merged without duplicates. When clicking on a creator name, the user would see an editable form like the item editor, allowing the unique IDs to be managed. There might be a dropdown of supported ID schema.

    These IDs could be exported into CSL/CSL-JSON as properties on the name object as a URI (based on the assigned schema/translator). The renderer could then choose to render that URI in whatever format it wanted (e.g., as a link, in parens after each name, etc.). Exporting to something like Bibtex would be more of a problem, but they can be just dropped and the status quo maintained.

  • Thanks -- I indeed don't think it's a super complicated issue to solve conceptually: libraries have been doing something akin to what you suggest with VIAF files for a long time. The question is whether the added complexity in both database & UI/UX is worth it for Zotero. The number of people for which this is true
    2. Some citers like to maintain the original name from a publication (even if it is incomplete or wrong).
    is, I'm pretty convinced, quite small, so the payoff isn't huge here, and the investment substantial.
    Otoh, better author management in Zotero, e.g. the simple ability to merge/unify names across the database quickly, would solve a lot of issues people have on a daily basis with a lot less overhead.
  • RE better author management

    I like the way, this is done in Citavi. They provide an interface for certain field types where you are presented with all entries in your db for that field type and tools for merging and editing the entries. This exists for names (personal and institutional), journal and newspaper titles, series titles, publisher names, keywords, and some Citavi specific fields.

    The editing tools in that interface allow access to all the different aspects of the field type, like all the different name parts, abbreviations, ISSNs etc., and an extra note field.
  • They provide an interface for certain field types where you are presented with all entries in your db for that field type and tools for merging and editing the entries. This exists for names (personal and institutional)
    Exactly, I was thinking along those lines.
  • The Citavi interface sounds interesting but is somewhat orthogonal to the ORCID id. The main point I was trying to convey is that trying to choose a unique ID for authors is futile - there needs to be a database with 0..N IDs for each person, with a handler attached to each type of ID.
Sign In or Register to comment.