A broader approach to IDs (OCLC, PMID, PMCID, arxivID)

ow · February 7, 2016

Per https://github.com/zotero/translators/issues/1016, I'd like to consider new fields for major ID categories. I use many books without ISBNs so having the OCLC in particular would be extremely helpful. (When I write for Wikipedia, I also use a citation export that has an OCLC field if we were to populate it.) I was advised to also bring PMID, PMCID, arxivID into the mix. What are the downsides to adding such support, aside from bloat? Looking for general feedback on the idea.

adamsmith · February 7, 2016

So this starts with the question of how structured we want to make this:
Option a) is to allow for a fixed set of ids, option b) allows for custom entry of ID types. Pros and Cons are fairly obvious -- a) provides better, more transferable&citable data, option b) is more flexible. They're not necessarily exclusive approaches.
If we do option a), we want a good list of IDs.

Next question is GUI.
With option a), we'd definitely want a pull-down menu with addable new lines, very much like authors work now. With option b) I'm less sure how that'd look.

Final question I can think of is citations. This seems very hard to do with option b) if we want those citeable. With option a), we could do something akin to locators now, so each of them is stored as identifier with a testable type (<if identifier="oclc"> etc. We're not going to push out a CSL version with that type of new syntax super soon.
Alternative would be to just add variables like we have for ISBN, ISSN, and PM(C)ID now. That's much easier and we could put it into a point release, but it's only feasible with a fairly narrow set of IDs.

zuphilip · February 7, 2016

I would like to give some thoughts, which might also go over the scope of the original post. Still, I hope it helps to discuss these issues in some context as I see it:

1) An identifier can be used for citations. A lot of citation styles are using doi and some styles are also using ISBN. I think we have already seen styles which asks for the PMID and also arXiv-ID looks useful for referencing unpublished works on arXiv. Moreover, there exists more handler systems with some sort of permanent url like urn, hdl, ark, ...

2) An identifier can be used for enriching the data by providing an url more or other information. It might be some local information (e.g. call number for a book in a nearby library) or some additional data like annotations. Moroever, such links could be seen as some sort of data provenance, i.e. from which sources is actually the data extracted from.

3) An identifier in zotero could be used for additional functionality. It is already planned/discussed to improve the data in zotero by checking the data which are linked via doi. The same mechanism could be possible with any other identifier (assuming that they can be expressed as a link and there are some metadata available behind that URI). Moreover, we could use these identifier also for finding duplicates.

4) When a data scientist is working with data from zotero and some other sources, then any identifier can be helpful for finding matching between the different databases.

==> For 1) the approach a) could match. Moreover, we might just restrict the list of identifiers to only the things which are needed for at least one citation style. We could also use the itemType report and use report number and type (e.g. for arXiv documents).

==> For 2), 3), 4) the approach b) could be good. I would then suggest to save a URI rather than just a identifier as string. This could be seen similar to the linked data standard owl:sameAs

adamsmith · February 7, 2016

yes, I think storing more identifiers is a good idea from the Zotero side. Dan has put in some work along the lines of 4, I'd be curious how he thinks about that part. Lots of variables to juggle.

Could be that it makes sense to start by implementing a) because b) seems fairly complex and might not happen for a while. Not sure, though. We also don't want to close the door.

ow · February 21, 2016

Wikipedia's citation style lists of potential identifiers here: https://en.wikipedia.org/wiki/Template:Cite_journal#Identifiers

fbennett · February 21, 2016

Europe has introduced a system of identifiers for legislation and court judgements, and the UN has its own peculiar system of document identifiers. These are both niche uses, but important ones for authors who need them. It would be great if they could be accommodated when the identifiers scheme is extended.

ow · June 13, 2016

Would it be possible to start by adding OCLC and arxivID as major identifiers? The former would be of immediate use to all editors who use pre-ISBN texts.

adamsmith · June 13, 2016

we're already adding OCLC ID and arxivID in the extra field in a standardized format. mvolz could write very simple code in citoid to extract them. Format is analogous to PMID/PMCID
OCLC: https://github.com/zotero/translators/blob/master/Open%20WorldCat.js#L377

arxivID: https://github.com/zotero/translators/blob/master/arXiv.org.js#L244