Zotero RDF documentation
I'm failing to find a place on Zotero's site that documents for ordinary users (i.e., not geeks who live in sandboxes) the export format Zotero RDF. Am I mistaken? If not, I propose to start one that addresses a few immediate questions I have:
- How frequently does the underlying data model change?
- How reliable is the attribute rdf:about as a URI for a bibliographic entry? That is, will a given item in a collection always export to that number? If not, what conditions would create changes in the numbering scheme?
- What other considerations need to be kept in mind before deploying an exported Zotero RDF file as linked data?
If I do start such a page, would developers be ready to populate the content with their insight? Thanks!
Can you first tell us a bit more about your linked data needs?
To answer the first two questions:
1. Very rarely. Hasn't changed in years, but will likely have to change for Zotero 4.2 as we add fields and item types.
2. rdf:about changes depending on information included in Zotero - it uses existing URIs - URL and ISBN - when they're present, only then goes to the item number. I don't think the item# will change on an single computer, but I don't see how it can be the same across synced machines. Not sure though.
I apologize for my snide comment.
In the summer edition of the Guide to Evagrius Ponticus, which I'll release next month, I plan to expose my bibliographic data as (among other formats) Zotero XML-RDF. Here is the test version of what will go live on Monday. Bear in mind that I've chosen to push my Zotero group's data to my server on a quarterly basis, and that this is a long-term publishing endeavor that I hope outlives me.
I am trying to understand the potential of exposing the RDF file for linked open data. I realize that proper deployment as linked data requires some further work and thought (esp. regarding content negotiation), but I also need to understand better the data model behind Zotero RDF (thank you for building this in!). Surely others will have such questions, so I thought it best to suggest answers go to a documentation page rather than (merely) a forum thread.
I would guess that the longer-form item number (e.g., http://zotero.org/groups/22235/items/VN5ZV463) used in the bibliontology RDF export is more stable than the id numbers assigned to Zotero RDF items, no? Can such longer-form item names be used generally as URL-form URIs? That is, does Zotero commit to assigning such names permanently and uniquely? Can other people use these as names of resources in their own RDF triplestores anywhere on the web (I apologize for my own unexplained jargon, tossed out in the rush of the day)? If they can, are these to be considered names for the physical object being described by the Zotero record or as names for the Zotero record itself? (I'm really hoping for the former.)
It would be nice to make some of the material we're discussing here a prominent part of the standard documentation, say as an main article called "RDF, Semantic Web, and Linked Open Data" under Getting the Most out of Zotero. (I checked on these questions before posting, but found only an interesting thread from 6+ years ago. Such conversations deserve promotion.)
For the most part, I think the unified export should use Bibliontology RDF. Obviously some of the data is not covered by that format and where possible I would like to use existing vocabulary.
As adamsmith mentions above, we will need to make revisions for Zotero 4.2, so perhaps that's also the time to marge the two. AFAIK, it doesn't export collection structure or file attachments/notes. (though the option to export Notes is displayed upon export, which I think we need to remove) It would be the latter. You can have duplicate Zotero items describing a single physical object. Those Zotero items would have different item keys. Also, those URLs are per user. So "same" item in different libraries would have different URLs. The URL persists until the item is deleted. Not sure if URL persists if the item is merged with another item. Also not sure if the item key can be re-assigned if it has previously been deleted.
Bibliontology RDF exports all fields, but it doesn't export attachments, notes, or collections. Eventually, we should fix that, at which point Zotero RDF will be unnecessary. There is actually a mapping between items in different libraries that are copies of each other stored in the Zotero database (as an RDF triple, actually), but this is currently not exported in any way.
We are working on global items, which will allow us to assign URIs to the physical object, although it may be a while before that's available through export. If two items are merged, one of them keeps the same item key and the other is mapped to that item key via a record in the Zotero database. An item key could be re-assigned, but it's extremely unlikely; the space of item IDs is very large. The lack of documentation is my fault, but the primary reason is lack of interest. Attempts to formalize bibliographic metadata in RDF haven't really caught on. I can speculate on three reasons for this. First, RDF is complicated; even ignoring the complexity of parsing RDF/XML into triples, there are many ways to formalize the same metadata in RDF. Even if two implementations use the same ontologies, that's no guarantee they'll interoperate. Writing code that can deal with the immense amount of flexibility that RDF provides is non-trivial. Second, structured metadata is inherently more complicated than flat metadata. People who care about structured metadata (e.g. librarians and publishers) already have their own standards (MODS and PRISM), and so RDF has little value to them. Other people don't understand why structured metadata is useful at all (Bruce D'Arcus had to convince Ian Hickson that it wasn't a good idea to make BibTeX part of HTML5). Finally, few people store bibliographic information as structured metadata, so any meaningful structure needs to be generated by record linkage, which is itself non-trivial to do right.
There's some hope that the Schema Bib Extend working group will be able to produce a way of representing bibliographic metadata that makes everyone happy, but that remains to be seen.
I also started a git repository that contains the data above exported into all of the currently supported formats (except for Unqualified Dublin Core RDF, which appears to be currently broken). I think this will provide a better platform for us to discuss changes to export formats.