Zotero RDF documentation

Arithmeticus · May 31, 2013

I'm failing to find a place on Zotero's site that documents for ordinary users (i.e., not geeks who live in sandboxes) the export format Zotero RDF. Am I mistaken? If not, I propose to start one that addresses a few immediate questions I have:

How frequently does the underlying data model change?
How reliable is the attribute rdf:about as a URI for a bibliographic entry? That is, will a given item in a collection always export to that number? If not, what conditions would create changes in the numbering scheme?
What other considerations need to be kept in mind before deploying an exported Zotero RDF file as linked data?

If I do start such a page, would developers be ready to populate the content with their insight? Thanks!

Rintze · May 31, 2013

for ordinary users (i.e., not geeks who live in sandboxes)

Just noting that "us vs. them" language is rarely constructive.

Can you first tell us a bit more about your linked data needs?

adamsmith · May 31, 2013

As Rintze says, some more details on what you want to use this for more exactly would be helpful.
To answer the first two questions:
1. Very rarely. Hasn't changed in years, but will likely have to change for Zotero 4.2 as we add fields and item types.
2. rdf:about changes depending on information included in Zotero - it uses existing URIs - URL and ISBN - when they're present, only then goes to the item number. I don't think the item# will change on an single computer, but I don't see how it can be the same across synced machines. Not sure though.

Arithmeticus · May 31, 2013

I apologize for my snide comment.

In the summer edition of the Guide to Evagrius Ponticus, which I'll release next month, I plan to expose my bibliographic data as (among other formats) Zotero XML-RDF. Here is the test version of what will go live on Monday. Bear in mind that I've chosen to push my Zotero group's data to my server on a quarterly basis, and that this is a long-term publishing endeavor that I hope outlives me.

I am trying to understand the potential of exposing the RDF file for linked open data. I realize that proper deployment as linked data requires some further work and thought (esp. regarding content negotiation), but I also need to understand better the data model behind Zotero RDF (thank you for building this in!). Surely others will have such questions, so I thought it best to suggest answers go to a documentation page rather than (merely) a forum thread.

Simon · May 31, 2013

For most purposes, you should use Bibliontology RDF instead of Zotero RDF. The Bibliontology RDF translator imports and exports all fields available in Zotero (although not attachments or collections at present), and the data model is appreciably better than Zotero RDF. There is documentation at http://bibliontology.com/, and the mappings between Zotero fields and Bibliontology RDF are described in a set of at least vaguely human-readable tables at the start of the translator (https://github.com/zotero/translators/blob/master/Bibliontology RDF.js, probably more legible in a text editor that uses 4 spaces for tabs).

Arithmeticus · May 31, 2013

Thanks Simon for the good suggestion. I had earlier dismissed the Bibliontology export as unsuitable for my purposes because it failed to export notes (which I prefer to the abstract field, so that my fellow editors, who may not know tags, can provide rich formatting). But I may reconsider. What other Zotero data are lost in exporting to the Bibliontology format? I know I can find this out myself by studying the mappings and subtracting, but perhaps someone else already has at hand a brief list and so save me the trouble.

I would guess that the longer-form item number (e.g., http://zotero.org/groups/22235/items/VN5ZV463) used in the bibliontology RDF export is more stable than the id numbers assigned to Zotero RDF items, no? Can such longer-form item names be used generally as URL-form URIs? That is, does Zotero commit to assigning such names permanently and uniquely? Can other people use these as names of resources in their own RDF triplestores anywhere on the web (I apologize for my own unexplained jargon, tossed out in the rush of the day)? If they can, are these to be considered names for the physical object being described by the Zotero record or as names for the Zotero record itself? (I'm really hoping for the former.)

It would be nice to make some of the material we're discussing here a prominent part of the standard documentation, say as an main article called "RDF, Semantic Web, and Linked Open Data" under Getting the Most out of Zotero. (I checked on these questions before posting, but found only an interesting thread from 6+ years ago. Such conversations deserve promotion.)

aurimas · May 31, 2013

We started a small discussion in this GitHub thread about unifying the RDF export formats and it's something I would still like to do. I think the two different RDF export formats are confusing and unnecessary. Even worse is that neither of them is a complete export of Zotero data.

For the most part, I think the unified export should use Bibliontology RDF. Obviously some of the data is not covered by that format and where possible I would like to use existing vocabulary.

As adamsmith mentions above, we will need to make revisions for Zotero 4.2, so perhaps that's also the time to marge the two.

What other Zotero data are lost in exporting to the Bibliontology format?

AFAIK, it doesn't export collection structure or file attachments/notes. (though the option to export Notes is displayed upon export, which I think we need to remove)

If they can, are these to be considered names for the physical object being described by the Zotero record or as names for the Zotero record itself? (I'm really hoping for the former.)

It would be the latter. You can have duplicate Zotero items describing a single physical object. Those Zotero items would have different item keys. Also, those URLs are per user. So "same" item in different libraries would have different URLs.

That is, does Zotero commit to assigning such names permanently and uniquely?

The URL persists until the item is deleted. Not sure if URL persists if the item is merged with another item. Also not sure if the item key can be re-assigned if it has previously been deleted.

Simon · May 31, 2013

We started a small discussion in this GitHub thread about unifying the RDF export formats and it's something I would still like to do. I think the two different RDF export formats are confusing and unnecessary. Even worse is that neither of them is a complete export of Zotero data.

I'd like to get rid of Zotero RDF export entirely at some point, since it doesn't model the data particularly well. With that said, it does actually export everything visible in the UI.

Bibliontology RDF exports all fields, but it doesn't export attachments, notes, or collections. Eventually, we should fix that, at which point Zotero RDF will be unnecessary.

It would be the latter. You can have duplicate Zotero items describing a single physical object. Those Zotero items would have different item keys. Also, those URLs are per user. So "same" item in different libraries would have different URLs.

There is actually a mapping between items in different libraries that are copies of each other stored in the Zotero database (as an RDF triple, actually), but this is currently not exported in any way.

We are working on global items, which will allow us to assign URIs to the physical object, although it may be a while before that's available through export.

The URL persists until the item is deleted. Not sure if URL persists if the item is merged with another item. Also not sure if the item key can be re-assigned if it has previously been deleted.

If two items are merged, one of them keeps the same item key and the other is mapped to that item key via a record in the Zotero database. An item key could be re-assigned, but it's extremely unlikely; the space of item IDs is very large.

It would be nice to make some of the material we're discussing here a prominent part of the standard documentation, say as an main article called "RDF, Semantic Web, and Linked Open Data" under Getting the Most out of Zotero.

The lack of documentation is my fault, but the primary reason is lack of interest. Attempts to formalize bibliographic metadata in RDF haven't really caught on. I can speculate on three reasons for this. First, RDF is complicated; even ignoring the complexity of parsing RDF/XML into triples, there are many ways to formalize the same metadata in RDF. Even if two implementations use the same ontologies, that's no guarantee they'll interoperate. Writing code that can deal with the immense amount of flexibility that RDF provides is non-trivial. Second, structured metadata is inherently more complicated than flat metadata. People who care about structured metadata (e.g. librarians and publishers) already have their own standards (MODS and PRISM), and so RDF has little value to them. Other people don't understand why structured metadata is useful at all (Bruce D'Arcus had to convince Ian Hickson that it wasn't a good idea to make BibTeX part of HTML5). Finally, few people store bibliographic information as structured metadata, so any meaningful structure needs to be generated by record linkage, which is itself non-trivial to do right.

There's some hope that the Schema Bib Extend working group will be able to produce a way of representing bibliographic metadata that makes everyone happy, but that remains to be seen.

Arithmeticus · June 5, 2013

Simon's and Aurimas's comments are instructive, suggesting other background knowledge that would be a big help to scholars who are increasingly interested in, or trying to deploy, structured data. The work underway to assign URIs to the physical object is really important IMO. So this is merely a request for some kind of well-placed documentation page on the topic. Although I volunteered to start one, I think either of you are better qualified than I to begin and structure it, although I'm more than happy to coedit the page, from the perspective of a user.

aurimas · July 10, 2013

I created a public (closed membership) group library for Zotero testing purposes. There's only one collection in it right now (All Item Types), which contains all currently supported item types with all fields filled out with data that corresponds to the Zotero label. I think this will be useful for testing Zotero import/export or other operations.

I also started a git repository that contains the data above exported into all of the currently supported formats (except for Unqualified Dublin Core RDF, which appears to be currently broken). I think this will provide a better platform for us to discuss changes to export formats.