A broader approach to IDs (OCLC, PMID, PMCID, arxivID)
Per https://github.com/zotero/translators/issues/1016, I'd like to consider new fields for major ID categories. I use many books without ISBNs so having the OCLC in particular would be extremely helpful. (When I write for Wikipedia, I also use a citation export that has an OCLC field if we were to populate it.) I was advised to also bring PMID, PMCID, arxivID into the mix. What are the downsides to adding such support, aside from bloat? Looking for general feedback on the idea.
Option a) is to allow for a fixed set of ids, option b) allows for custom entry of ID types. Pros and Cons are fairly obvious -- a) provides better, more transferable&citable data, option b) is more flexible. They're not necessarily exclusive approaches.
If we do option a), we want a good list of IDs.
Next question is GUI.
With option a), we'd definitely want a pull-down menu with addable new lines, very much like authors work now. With option b) I'm less sure how that'd look.
Final question I can think of is citations. This seems very hard to do with option b) if we want those citeable. With option a), we could do something akin to locators now, so each of them is stored as identifier with a testable type (<if identifier="oclc"> etc. We're not going to push out a CSL version with that type of new syntax super soon.
Alternative would be to just add variables like we have for ISBN, ISSN, and PM(C)ID now. That's much easier and we could put it into a point release, but it's only feasible with a fairly narrow set of IDs.
1) An identifier can be used for citations. A lot of citation styles are using doi and some styles are also using ISBN. I think we have already seen styles which asks for the PMID and also arXiv-ID looks useful for referencing unpublished works on arXiv. Moreover, there exists more handler systems with some sort of permanent url like urn, hdl, ark, ...
2) An identifier can be used for enriching the data by providing an url more or other information. It might be some local information (e.g. call number for a book in a nearby library) or some additional data like annotations. Moroever, such links could be seen as some sort of data provenance, i.e. from which sources is actually the data extracted from.
3) An identifier in zotero could be used for additional functionality. It is already planned/discussed to improve the data in zotero by checking the data which are linked via doi. The same mechanism could be possible with any other identifier (assuming that they can be expressed as a link and there are some metadata available behind that URI). Moreover, we could use these identifier also for finding duplicates.
4) When a data scientist is working with data from zotero and some other sources, then any identifier can be helpful for finding matching between the different databases.
==> For 1) the approach a) could match. Moreover, we might just restrict the list of identifiers to only the things which are needed for at least one citation style. We could also use the itemType report and use report number and type (e.g. for arXiv documents).
==> For 2), 3), 4) the approach b) could be good. I would then suggest to save a URI rather than just a identifier as string. This could be seen similar to the linked data standard
owl:sameAs
Could be that it makes sense to start by implementing a) because b) seems fairly complex and might not happen for a while. Not sure, though. We also don't want to close the door.
OCLC: https://github.com/zotero/translators/blob/master/Open%20WorldCat.js#L377
arxivID: https://github.com/zotero/translators/blob/master/arXiv.org.js#L244