Creating personal Web page for recognition by generic translators

funnell · November 11, 2016

I manually maintain a Web page listing my publications. I would like to change it so the publications (including those without DOI's) would be recognized by Zotero (and maybe by other harvesters). I looked at the non-DOI generic translators (EM, COinS and UnAPI) and it seems that they would all involve duplicate data (the metadata and the displayed data), which seems inelegant and error prone. I had supposed that there would be some standard whereby I could write, say, 'name and the class name would be recognized by Zotero's translator.
Apparently there is no such thing? (Maybe the new My Publications feature in Zotero 5 is or will be relevant, but it doesn't sound like quite what I'm looking for.)
- Robert

dstillman · November 11, 2016

You're describing RDFa, which Zotero doesn't currently support. (There's also the less-successful Microdata.) We might support RDFa at some point in the Embedded Metadata translator, but JSON-LD is more of a priority. The problem with wrapping-based approaches is that, while they feel nice and elegant, the visible data in a given bibliography often wouldn't actually contain all the data available, so the round-trip would be lossy. Losing data is definitely worse than having some (automatically generated) duplicate data.

(We'll be releasing a JavaScript library and Node module for My Publications that you could use to generate a publications list on your site, and it will eventually include JSON-LD.)

dstillman · November 11, 2016

And note that, in the meantime, HTML bibliographies generated by Zotero already contain COinS tags with (most of) the original data.

funnell · November 11, 2016

Thanks for the quick reply.
Am I correct that supporting RDFa would really mean supporting RDFa+DC and/or RDFa+foo and/or RDFa+bar? And the same for JSON-LD?
The fact that one might want some data not to be visible could be worked around by using the visibility property in CSS?
I saw that about Zotero creating HTML bibliographies that contain COinS, but at this point it seems that I'd be happier with the HTML being the input, not the output. But maybe that doesn't really make sense.
I currently need to maintain 2 lists, one for my Web page and one for the Canadian Common CV for granting agencies. Apparently the CCV can export some kind of XML. Maybe I need to look at transforming that into HTML.

dstillman · November 11, 2016

Am I correct that supporting RDFa would really mean supporting RDFa+DC and/or RDFa+foo and/or RDFa+bar? And the same for JSON-LD?

The JSON-LD ticket linked above gets into the specifics. Zotero already imports and exports data in a range of different ontologies — it's what you get when you use the RDF export options in Zotero.

The fact that one might want some data not to be visible could be worked around by using the visibility property in CSS?

But even the format might be different — to use your example from above, it's quite likely that the name wouldn't be formatted in the way you want (e.g., it might be "Last, First" or use "et al."), so you'd need to duplicate that entire field anyway, and the same for various other fields. I'm not sure how hiding duplicate data in HTML with CSS is better than just including a complete hidden record with all necessary data.

funnell · November 11, 2016

I'll need to read more about JSON-LD.
You're right, hiding duplicate data in HTML would not be good, although I'm not clear yet about why the format wouldn't be what I want (or at least why I wouldn't be able to live with whatever format restrictions there might be).
Thanks again for your time.

dstillman · November 11, 2016

I'm not clear yet about why the format wouldn't be what I want (or at least why I wouldn't be able to live with whatever format restrictions there might be)

It'd be fine if you were just displaying a full metadata table for each item, but if you were displaying a bibliography and the citation format was "Smith et al.", you'd be missing the first name and all the other authors. Same for various other fields.

funnell · November 11, 2016

OK. My goal is just to display a (harvestable) full list of my publications, so I'd want to display everything, and the display format could be anything that I like. (One reason I think I want the HTML file to be the source is that I could completely control appearance, sectioning, order, etc., rather than relying on or fighting with the format preferences of some tool for exporting HTML.)