RDF is all broken

westpjc · February 6, 2008

OK I posted this before and got zero responses.

This time I'll be less polite.

The Zotero RDF export DOES NOT GENERATE VALID rdf/xml. It's so broken I have not found a parser that can deal with it. I've hacked up fixes with regex'ing around but it is tiresome and fragile and I still get bogus uri's where bnodes were intended.

I could currently make my zotero exports available via SPARQL servers (which by the way would be a good way to share zotero in a 2.0 world) but some effort would be required to get zotero up-to-speed with import/export of VALID rdf/xml.

I could hack up my own export in python by going directly to the sqlite db but that would probably break at some random upgrade point.

If there is no interest in making the RDF export work...then drop it and let someone write a plugin. That won't happen if Zotero proper pretends to do RDF.

--
...whew...it's still broken but I feel better now.

ciao

Phil

deanc · February 6, 2008

i can't get RDF exports to work either - i just want a way to have multiple users colaborate on the one page without having to reformat the pages into open office or word. has anyone got a solution to this?

Simon · February 6, 2008

Are you running the latest version of Zotero? Do you have automatic updating of translators enabled? Have you tried clicking "Reset Translators and Styles" in the Advanced pane of the Zotero preferences? The issue you reported here was fixed in an update to the RDF translator about a month ago. Of course, the RDF still looks like crap because that's the way Mozilla's RDF library serializes it, but I don't have any trouble parsing it with raptor (haven't tried others).

westpjc · February 6, 2008

OK not valid but better. The original post was mine and I checked for responses before I reposted.

It now does parse into rdflib but not into sesame. It's still not valid rdf/xml and if that is Mozilla's fault then it might not be fixable (easily that is). For now I have written a rewrite hack to rename "rdf:" type nodes into proper bnodes.

It's beyond an import/export issue but seriously, using rdf as the actual datastore (i'm looked at the sql tables and the style of indirection it uses smacks of a graph model anyway...it's not true relational) and putting it into a piggy-bank store (see the SIMILE) and then use the Longwell to make it distributed.

Actually, I just double checked, Zotero and Simile are both supported by "The Andrew W. Mellon Foundation".

Now I should look around to see where that suggestion should be posted.

--
Phil

dstillman · February 6, 2008

I imagine Mozilla's RDF support will be improved in Firefox 3.

Sorry, but we're not interested in using RDF as the main datastore. Zotero makes heavy use of SQL (and will use the additional SQLite features available in Firefox 3).

We work with the SIMILE folks on a number of things, and some of their developers are working on automating transfer of Zotero data to SIMILE tools (beyond Timeline, which is already built-in).

Simon · February 6, 2008

Eventually we hope to supplement the RDF/XML export with Turtle export, which should be more concise, more readable, and eliminate the rdf: nodes (which are an artifact of using Mozilla's RDF support). In the short term, we could just give the rdf: nodes random IDs if they're interfering with parsing.

In Zotero 1.5, we plan to move to a hierarchical data model, and we hope to integrate RDF mappings for our fields into it.

bdarcus · February 6, 2008

Sorry, but we're not interested in using RDF as the main datastore. Zotero makes heavy use of SQL (and will use the additional SQLite features available in Firefox 3).

This may be a little crazy, but ...

I've been wondering about the possibility of hybrid relational/RDF storage. If you look at something like the PHP-based ARC toolkit, for example, they manage to really easily merge it into WordPress and Drupal. From what I understand, they just add a few generic tables to represent the components of the triples, and that just sits alongside the standard tables.

WRT to Zotero, then, you use the RDF stuff for all the custom data users might want to store that you don't want to design in.

My thought is this gives the best of both worlds, responding to requests for flexibility and "custom fields" but without all the problems that you see in implementations like Endnote, where the custom data is effectively completely opaque.

dstillman · February 6, 2008

Bruce: That might make sense, as long as the triples are stored in the DB. I just don't want to lose the performance and storage efficiency benefits of SQLite and features like transactions, full-text search (which will be available in mozStorage in Firefox 3), etc. But I could definitely see using a generic triple store in the database for the custom data, which would also allow for naïve syncing of that data.