RIS import, archival location, RIS AV field

jrochkind · January 30, 2018

The RIS format has a field "AV" for "Location in Archives".

However, Zotero RIS import ignores this field, and instead puts "AN" "accession number" into "Loc. in archives" field.

Should the RIS importer be using the "AV" field?

adamsmith · January 30, 2018

Where are you getting that field from? We extracted these: https://github.com/aurimasv/translators/wiki/RIS-Tag-Map-(narrow) from the official RIS specifications, which don't mention anything about AV

jrochkind · January 30, 2018

It is very confusing to figure out where the 'official RIS specification' is, it seems to be an abandoned standard. What reference are you using for 'official RIS specification'?

The "AV" field is listed on the wikipedia page:
https://en.wikipedia.org/wiki/RIS_(file_format)

But indeed you're right, it's not listed on this historical PDF from Refman, oops, I thought it was: https://web.archive.org/web/20120526103719/http://refman.com/support/risformat_intro.asp

Very confusing. Is that historical PDF what you use for 'official' RIS spec?

With RIS being the most common interchange format for citation managers, I really wish it were actually a maintained/developed standard!

jrochkind · January 30, 2018

Interestingly, the wiki page you link to at https://github.com/aurimasv/translators/wiki/RIS-Tag-Map-(narrow) says you map "AN" to "Accession Number" for all reference types.

But in fact it's going to "Loc. in archives".

Is there a better place to look to see what zotero is actually doing currently?

(Yes, I am a software developer. I guess another question is, if I want to write software that can export to zotero well -- is there a better format to use than RIS? The benefit of RIS is that it can be used by a variety of products, but do you have advice here?).

adamsmith · January 30, 2018

There is indeed no useful bibliographic exchange format. It's a fairly ridiculous situation. You'll get the best import into Zotero using Zotero RDF, but a) that isn't well documented and b) it'll probably be replaced with a JSON-LD/schema.org based schema in the not-too-distant future, so I wouldn't invest heavily in implementing it. Endnote XML is marginally better documented and, by virtue of being XML, more robust, so that might be worth it. BibLaTeX is very precise and exceedingly well documented, but I don't think many tools other than Zotero do very well importing it (and I don't know _how_ well Zotero does -- most people use this the other way from Zotero to BibLaTeX).

Yes, we follow the 2012 Refman published standards, which was the last time the RIS standard was published. I like Wikipedia, but don't trust them on this at all.
You can check what Zotero actually does by following the code: https://github.com/zotero/translators/blob/master/RIS.js
Given that we try to account dozens of variants of RIS existing in the wild, this is effectively not documentable other than through the source code. I believe the AN matching is based on what Endnote (X6) actually does as opposed to the standard. It's Claryvate's (formerly Thompson Reuters') standard, so we went with what they do rather than what they say.

jrochkind · January 30, 2018

Cool, thanks, what a mess.

In my own testing of current Endnote Web, it seems to use the VL "Volume" field for "Location in Archives" rather than "AN". Perhaps it changed since you last checked.

As usual, it's a standard that is not so standard, so it goes!

adamsmith · January 30, 2018

it seems to use the VL "Volume" field for "Location in Archives" rather than "AN"

ugh, but that's terrible, we can't possibly do that. I'll check what Desktop Endnote does when I get a chance.

jrochkind · January 30, 2018

oh yeah, I am not advocating you do that.

In my export, I'm just putting the location in archives information in VL, AN, _and_ AV.

Depending on the software involved and the citation type, it sometimes shows up in multiple places, but at least it shows up in the "location in archives"-labelled field in every software/type combo I found that had such a field.

noksagt · January 30, 2018

I'd put in a plug for MODS XML as a reasonable candidate for interchange format as well (the LoC is a trusted maintainer & it is well documented, Zotero's implementation is reasonably good).

The latest RIS specification on endnote.com is from May 2009, ironically slightly more recent than the pdf in that 2012 archive (pdf dated Sept 2008). There are likely few (if any) differences.

I doubt that "Location in Archives" would "round trip" or import back into Endnote Web in that field if you put it in a "VL" (since many item types have a volume field). This seems to be a case that we should "do what they say, not what they do".

emilianoeheyns · February 2, 2018

I have no specifics at hand, but way back when I started putting together a test framework for BBT, none of the formats (including MODS XML) would provide me with a way to fill all the Zotero fields. I created my own translator which imports/exports mostly plain JSON dumps of what the translators are offered internally. I'd be happy to ditch that if MODS XML could do the job. Note that my own translator (BetterBibTeX JSON) probably does not work when BBT is not installed.

zuphilip · February 4, 2018

@jrochkind To give you some more options from library formats: Zotero can also read MAB2, MARC and MARCXML and I hacked together a BIBFRAME importer. These are not extensively tested but we have used them e.g. for importing marc21, unimarc and belmarc data. Besides that CSL JSON could also be an option which is supported by DataCite, Crossref and mEDRA as well.

jrochkind · February 21, 2018

Thanks @adamsmith and @zuphilip!

I've gone forward with RIS for now, but CSL-json seems like an interesting future option, which is nicer in some ways than RIS. The only docs I can find for it are a json-schema, is there anything I'm missing?

I have another related question to effective Zotero workflow: I have a link on my page which delivers RIS. If I click it with the (in my case) Chrome Zotero plugin installed Zotero nicely intercepts it and imports it directly to Zotero, great!

But if I/a user instead clicks the Zotero icon in the Chrome toolbar (which has a mouseover title "Save to Zotero (Embedded Metadata)", then it of course does not use the RIS, and gets only stunted/inaccurate metadata.

I think it says "(Embedded Metadata)" instead of "(Web Page with Snapshot)" only because of some markup on my page (COinS probably), so I could probably get it to do the latter -- but still not actually the solution I want.

Is there any generic way I can make the "Save to Zotero" toolbar button use the RIS I am already producing?

I was hoping that using standard HTML to say "alternate representation" might work:

`link rel="alternate" type="application/x-research-info-systems" title="RIS export" href="$url_that_returns_ris" /` (angle brackets removed so they aren't swallowed in comment)

But zotero does not seem to pick this up. (Would that be a good feature enhancement?)

Is there any other way to get Zotero to realize it can/should use my RIS representation as from the toolbar "Save To Zotero" button?

Alternately/additionally, can you point me to documentation what formats Zotero can recognize as "Embedded Metadata"? Do any of them allow as much granularity/specificity as RIS? (COinS doesn't really cut it, it is not very expressive and can not accurately capture our correct citations).

Thanks for any advice! Maybe I should make a new forum topic...

noksagt · February 21, 2018

Is there any generic way I can make the "Save to Zotero" toolbar button use the RIS I am already producing?

If you're making dynamic web pages using some server-side programming language, UnAPI is great for this. You can optionally add more formats as they become available. Other options are enumerated on "exposing metadata".

jrochkind · February 21, 2018

Okay, i will explore UnAPI. But I think it's both a somewhat abandoned standard (the web page http://unapi.info documenting it is available only in internet archive wayback machine), and also somewhat complex to implement.

Would there be any chance of getting Zotero to recognize `link rel="alternate" type="[mime-type zotero recognizes for citations]" href="[the url]"`, which seems to do much the same thing for this use case, but much simpler, and using just HTML standard?

Thanks for the link to 'exposing metadata' doc, I will spend some time with it.

jrochkind · February 21, 2018

There has also been some discussion/critique of unapi's misuse of the HTML `abbr` tag. Although I can't find a link now. But I believe use of `abbr` tag like that will cause some assistive technology to read out the title attribute to the user, or at least offer to do so, since it's marked up as an "abbreviation" after all, which it's not really what it is.

It would be awesome to have an alternative. I think plain old link rel=alternate could be it?

I edited by above posts to keep them from swallowning my attempt at embedding html link tag source code.

noksagt · February 21, 2018

Some discussion on GCS/PCS also included a way of advising screen readers not to read a particular abbr tag.

I thought there's been discussion on link rel here and/or the Zotero dev group. I don't know the status. One seeming limitation is that link is reserved to describing the current document. It may be suitable for individual records, but seems like it may not work for a page that listed multiple resources.

UnAPI addresses that. It is a "crusty/unmaintained/unpopular" standard, but it is relatively easy to implement & it works. I haven't seen anything "better" actually stick.

jrochkind · February 21, 2018

Depending on the format it's quite suitable for a page listing multiple records. An RIS file is quite capable of having multiple citations within it, no?

I think meta rel has definitely "stuck". It would make it _so_ much easier to support Zotero, and is an actual HTML standard.

My javascript is rudimentary, but if I somehow figured out how to build the zotero connector and writes tests for code, would a PR adding link rel=alternate support be looked upon favorably? I think it would end up being quite similar logic to unapi, just using meta rel="alternate" for alternate representation discovery.

dstillman · February 21, 2018

Yeah, I don't think there's any reason for us not to support <link>, though I'm not sure whether rel="alternate" or rel="meta" would be more appropriate. The latter seems to be the standard approach with RDF, at least, and seems to correspond to <meta> tags that we already support. I don't see any reason 'meta' couldn't be used with other content types, but I'm not sure what's more common in actual usage (to the extent that such usage exists).

It does seem that the main advantage of unAPI was that you could associate the metadata with a specific element on the page, but given the way Zotero presents multiple results, that's irrelevant for Zotero anyway.

A few previous mentions of <link> (dating back to 2010, which raises the question of why we didn't add support for this then):

https://forums.zotero.org/discussion/comment/60354#Comment_60354
https://forums.zotero.org/discussion/comment/189104/#Comment_189104
https://forums.zotero.org/discussion/comment/228372/#Comment_228372

(We should, however, still support JSON-LD.)

adamsmith · February 21, 2018

My recollection is that I have always wanted to support link, but never got to it

dstillman · February 21, 2018

It does raise some complicated questions about translator priority, given that currently the priorities of import translators don't apply to web transation. E.g., MODS is 50, which is fine, but RIS is 100, which is the same as most web translators.

dstillman · February 21, 2018

It seems like we want to prioritize any priority=100 web translator, but it gets messy after that. RIS is 100 and Embedded Metadata is 320. Do we want a linked RIS to take precedence over embedded metadata?

jrochkind · February 21, 2018

Not suggesting removing unapi, you can leave both! Because, yes, unapi is more powerful in that way.

You could also support both rel=alternate and rel=meta. It's just scanning for discovery of alternate formats (to then do with them similar to what it would with unapi), it could look at both rels, without problems, I think?

To me it seems to be obviously an 'alternate' representation (in, say RIS or csl-data-json formats), and I think it's at least somewhat likely to be on pages even if it wasn't put there specifically targetting Zotero.

There are some use cases the rel thing won't work for, in which case you still have all the other options -- for ones it will work for, it's both bog-standard HTML and extremely easy to implement (even easier than unapi for sure).

Mentioning csl-data-json makes me think -- is there even any way to get csl-data-json to be recognized by unapi currently? I don't know if it even has a mime-type?

csl-data-json seems the best candidate to become (inshallah) an actual generic cross-software standard lacking the weirdness (and no standards maintainer) of RIS.

dstillman · February 21, 2018

Looking for both rel="alternate" and rel="meta" for all types seems a little clumsy — why should both exist if they do the same thing? — so I think we'd want to at least make an effort to determine whether one was more common (if for no other reason than to make a better recommendation to websites wanting to implement this).

Mentioning csl-data-json makes me think -- is there even any way to get csl-data-json to be recognized by unapi currently? I don't know if it even has a mime-type?

application/vnd.citationstyles.csl+json

csl-data-json seems the best candidate to become (inshallah) an actual generic cross-software standard lacking the weirdness (and no standards maintainer) of RIS.

I believe CSL-JSON is limited to what's commonly used in citations, so it's inherently lossier than, say, RDF.

jrochkind · February 21, 2018

csl-json is still better than RIS though! generic RDF is certainly incredibly expressive, but good luck getting multiple software packages to interoperate and all get all the same data out. :)

At any rate, getting unapi to recognize csl-json with the `application/vnd.citationstyles.csl+json` type seems like another good feature request, if it doesn't already?

noksagt · February 21, 2018

Multiple software packages support MODS, MARC, and UniMARC (all are also more expressive than RIS).

But: I agree that adding both CSL-JSON and Endnote XML seems like "low hanging fruit".

jrochkind · February 22, 2018

Actually note also that html5 allows the `link` tag to be in body, as many times as you want, as part of the html5 microdata api. Although it needs an 'itemprop' attribute instead of a `rel` attribute. it still gets a `type`.

So there might be a way to work this out for many times in a page, with each time associated with a particular DOM region, like unapi. Although it sounds like Zotero doesn't really do anything that would benefit from that anyway, at least for formats which allow more than one citation in them (which might be all of them?), makes more sense just to have a single head link tag pointing to the alternate representation with all of them in it. Probably not worth it to pursue, just mentioning for posterity.

As far as rel=alternate vs meta, I'm curious about why some communities are using using `meta` for this. `link rel=alternate` is already the de-facto standard for RSS auto-discovery for instance (on a page listing multiple entries, rel=alternate to the RSS representation of those multiple entries). If Zotero implemented this and didn't recognize both and chose `meta` as the only one, I'd probably just duplicate on my side, including both `meta` and `alternate`, cause alternate seems right to me, heh. Makes more sense to me for Zotero to just recognize both conventions, if both are in use 'in the wild', doesn't seem to add any implementation complexity.

But I could certainly deal with it either way, if Zotero recognized `link` tags I'd be happy regardless of alternate vs meta.

dstillman · March 7, 2018

So apparently we even have a ticket for this from 2011…

https://github.com/zotero/translators/issues/77