Adding dc values to archive website pages

navtis · February 10, 2016

I'm setting up a website which will have transcriptions of eighteenth century documents, and trying to design an HTML5 layout for it. I'd like all the pages to include bibliographic data which zotero can use automatically without needing any special translators. I want to use Dublin Core for this.

I've run into 2 problems:

1. I'd prefer to have the bibliographic data in the page where it's visible to readers (using spans with dc values) but zotero seems to ignore Dublin Core in the body completely - is that correct?

2. When I put my data into the head as metadata instead, Zotero recognizes the dc.creator, dc.title dc.date, and dc.rights values, but ignores dc.publisher, dc.contributor, and dcterms.provenance.

Is there likely to be any way for me to fix these problems, short of providing a translator?

Thanks

Graham

adamsmith · February 10, 2016

1) No. (we've seen too many problems when grabbing meta tags from anywhere but the header, so this isn't purely ideological, but I'd also argue that DC data does belong in meta tags in the header.) If you want to embed machine-readable data with the human-readable part of the webpage, microdata like RDFa would seem like a better solution. Zotero doesn't currently support that, but that's largely because of lack of demand and could/should happen.

2.) Do you have an example live? publisher and contributor should certainly be recognized as such (provenance not). What dc.type do you specify? If the item type doesn't have a publisher field, Zotero obviously wouldn't import one.

If you want to be proactive (and it's worth considering, especially for archival data poorly served by available metatag standards), you could serve a richer format via unAPI, but that is more work than metatags.

DWL-SDCA · February 10, 2016

As the operator of an online bibliographic database that offers metadata by unAPI as well as RIS, BibTeX, and GS/Highwire formats; I can say that (because we serve all bibliographic data from a structured database) it required only a few hours of effort for our web developers. We provide metadata for conference proceedings, books, journal articles, technical reports and theses. Both lists and individual records import nicely into Zotero. As adamsmith says, more work, perhaps, but the benefit is great.

navtis · February 11, 2016

Thanks both

I wasn't specifying dc.type. Once I do, dc.publisher works ok but I can't seem to get dc.contributor to be picked up. Are there constraints on what form it can take? In general, where is the best place to look for a list of allowed fields per item type, how the fields match to Dublin Core, and what format the field contents can take? (I've found partial information in various places).

A bit more context: this will be part of marxists.org (which is straying rather a bit from its original remit). The site is also distributed on hard drive, so we can't have any additional software, everything is flat html (so also no unApi). We depend on volunteers for transcribing texts. They are often not happy about adding metadata, so bibliographic data in the past has been minimal and it would have been easier to ask people to just add the in-page metadata in a new form than to ask them to add it twice, once in the body and again in the head. Of course for my little subarchive I can do both.

Graham

adamsmith · February 11, 2016

Should definitely work. Would you be able to put a single sample page online somewhere (can just be the plane html, you don't need to make it a webpage) so we can troubleshoot?

As for DC documentation, I don't know. I always thought it was a mess, but I'm not a librarian (at least not a real one).

The relevant part of our code starts here:
https://github.com/zotero/translators/blob/master/RDF.js#L750
and then in the handle creators function. We prefer lastname, firstname, but firstname lastname will work, too.

navtis · February 11, 2016

I put a temporary page here:

http://www.theseamans.net/documents/thatched.htm

adamsmith · February 11, 2016

ah yes. DC tags aren't well specified enough to get different creator types. We go through a list and import the first type and creator comes before contributor. Tons of webpages have the same info in contributor and creator tags and we'd be importing duplicates otherwise.

navtis · February 12, 2016

Any reason not to check eg if contributor field is not empty and differs in more than min characters from creator then accept it? Sounds easy enough to code, problem is not having the experience of quite how bad the data is and what side effects might be...

adamsmith · February 13, 2016

mainly that there is a cost in terms of complexity and understandability to adding more and more hacks to make sense of data formats that aren't designed for that. Even if we were to check contributor and then add it: what would we add it as? Contributor? That's not terribly useful data in Zotero anyway.