Export to Schema.org RDFa and/or Microdata
How would I go about adding HTML + RDFa [1] and/or HTML + Microdata [2] export templates with Schema.org classes and properties to Zotero?
References
[1] http://www.w3.org/TR/xhtml-rdfa-primer/
[2] http://www.w3.org/TR/microdata/
[3] https://en.wikipedia.org/wiki/Schema.org
[4] http://schema.org/docs/full.html
[5] http://lists.w3.org/Archives/Public/public-vocabs/2014Apr/0104.html
References
[1] http://www.w3.org/TR/xhtml-rdfa-primer/
[2] http://www.w3.org/TR/microdata/
[3] https://en.wikipedia.org/wiki/Schema.org
[4] http://schema.org/docs/full.html
[5] http://lists.w3.org/Archives/Public/public-vocabs/2014Apr/0104.html
There's an open ticket for import: https://github.com/zotero/translators/issues/366
but we don't have much in terms of a use case at this time (i.e. we're not seeing this on many sites).
In terms of "how" - the starting point should be the "Embedded Metadata" Translator. It may just be easiest to add RDFa and/or microdata support in there.
* Select a few citations
* Right-click: "Export selected items"
* [Request]: Select 'HTML (Schema.org RDFa)' format
* Click 'Ok'
(This may be difficult, as there would be no citation style selected.)
And/or
* Select a few citations
* Right-click: "Create Bibliography from Selected Items"
* Select a citation style
* [Request]: Select Save as 'HTML + [Schema.org] RDFa'
In terms of source files, from github search, I see:
* https://github.com/zotero/zotero/blob/master/chrome/content/zotero/bibliography.js
* https://github.com/zotero/zotero/blob/master/chrome/content/zotero/bibliography.xul
But I'm not sure where mappings between:
* Zotero types <--> http://schema.org/CreativeWork subclasses
* CSL fields <--> Schema.org properties (see [5])
would need to be.
Moreover, I'd question whether embedding microdata with citations actually make sense. The point as I understand it is to add a universal metadata structure to webpages - if you add microdata to citations _on_ a webpage, if anything, that would seem to be misleading for search engines, no?
edit: so, more generally speaking, I'm puzzled what Zotero has to do with this (except that it should eventually be able to read it). This would seem to be something to integrate into WordPress and similar systems that you use to generate actual webpages, not into a reference manager.
https://bitbucket.org/fbennett/citeproc-js/wiki/Home
http://citationstylist.org/docs/citeproc-js-csl.html
> Adding different microdata fields to citation components would entail mucking with the citeproc code which I don't think anyone is excited about.
Thank you for the feedback.
> Also, not every component of a citation is on a different html element, so I wouldn't even know where to put the rdfa.
In RDFa [1], there would be divs with spans, metas, and <a>s.
> HTML bibliographies already include COinS, which seems great for the purpose of exposing citation information.
I suppose RDFa could be added to the 'See Also' section of https://en.wikipedia.org/wiki/COinS
> Moreover, I'd question whether embedding microdata with citations actually make sense. The point as I understand it is to add a universal metadata structure to webpages - if you add microdata to citations _on_ a webpage, if anything, that would seem to be misleading for search engines, no?
AFAIU, RDFa and microdata metadata are distinct from the HTML page in which they're located. For example, a directory service may host a page which includes information about an Organization with one or more LocalBusinesses. [4]
There is also a Thing > CreativeWork > WebPage type.
As a data model for graphs of resources with URIs and URLs, there are lots of practical uses for RDF ( http://www.w3.org/wiki/ConverterToRdf )
RDF[a] supports links. http://catalogablog.blogspot.com/2010/02/rdf-coins-and-microformats.html
https://www.zotero.org/support/dev/exposing_metadata#using_an_open_standard_for_exposing_metadata
But let me step back a bit. What's your larger vision here? I don't understand where you're trying to go with this. Zotero principally generates bibliographies/citations. Is your idea to generate a bibliography with each item containing RDFa? I'd like to see any documentation that suggest such a use of RDFa - the links you posted all suggest using RDFa/Microdata to add structured data to a given page.
Objective: Produce an HTML page with bibliographic citation metadata that can be parsed and extracted back into RDF.
Yes.
Personally, I like Sphinx (reStructuredText) and bibtex.
PDFs print well, but 'most' of the time, they don't contain enough information to generate their own bibliographic citation (necessitating journal HTML parsers, which Zotero does so well).
> Exactly - and they don't exist in current citations, so you'd have to code all of this into citeproc.
https://bitbucket.org/fbennett/citeproc-js/src/tip/src/
> I'd like to see any documentation that suggest such a use of RDFa - the links you posted all suggest using RDFa/Microdata to add structured data to a given page.
https://en.wikipedia.org/wiki/RDFa :
> RDFa (or Resource Description Framework in Attributes[1]) is a W3C Recommendation that adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich metadata within Web documents. The RDF data-model mapping enables its use for embedding RDF subject-predicate-object expressions within XHTML documents. It also enables the extraction of RDF model triples by compliant user agents.
There is a lot of support for COinS. Is there anything that can be done with COinS that cannot be done with RDFa?
Zotero also doesn't generate other metadata formats it can use to read info from a page like google highwire or DC metatags.
If it's the latter, I still don't see how that's even supposed to look.
What would be helpful if you could provide a very specific, entirely non-abstract use case. I feel like we're talking past each other, so feel free to talk to me like I'm stupid, I won't take offense. content-wise I don't know, probably not. Structurally, COinS has the major advantage of being contained in a single span tag with no displayed text, which makes it very easy to generate/implement: Just put all the info into span in the right format. It's trivial for Zotero to generate this along with or entirely separate from citations. For RDFa et al., content and metadata are mixed. Which makes a lot of sense structurally, but I don't really see how Zotero would usefully generate that, since Zotero isn't used to generate content, just citations.
* Enter/collect structured data citations into Zotero [in: structured data]
* Generate bibliography with Zotero [out: unstructured textual data**]
** Bibliographies: RTF/HTML/TXT
** Exports: various RDF and non-RDF formats
Solution: Generate HTML + RDFa bibliography with Zotero (with whichever CSL style)
Scope: Zotero generates bibliographies in a number of output formats, with a number of citation styles
Value:
* Structured data
* Make 'round-trip' feasible (Citations -> Zotero -> Bibliography as RDFa -> Citations)
...
A COinS parser that outputs RDF triples would also be great.
[5] http://lists.w3.org/Archives/Public/public-vocabs/2014Apr/0104.html
though an additional RDF export format with http://schema.org classes and properties could also be helpful.
Obviously, feel free to take a stab at this yourself, but the citeproc-js code is massive.
1. Map from CSL Types and attributes to Schema.org classes and properties
* https://en.wikipedia.org/wiki/CiteProc
* http://citationstyles.org/downloads/specification.html
* https://en.wikipedia.org/wiki/Separation_of_presentation_and_content
* https://github.com/citation-style-language/styles/blob/master/bibtex.csl
* https://github.com/brechtm/citeproc-py/blob/master/citeproc/source/bibtex/bibtex.py
* https://github.com/brechtm/citeproc-py/blob/master/citeproc/source/json.py
2. Output RDFa:
* https://bitbucket.org/fbennett/citeproc-js/src/tip/src/formats.js (html, text, rtf)
It appears that the output formatters are not schema-aware.
* https://github.com/citation-style-editor/csl-editor/wiki/User-guide-for-the-CSL-Editor
One could generate Schema.org HTML + RDFa copies of requisite CSL styles with a really gnarly XSL workalike.
* http://gsl-nagoya-u.net/http/pub/citeproc-doc.html#generating-bibliographies
(Seems like a lot of work to punctuate triples out of nested JSON form.)
It would be relatively easy to create a JSON-LD context [6] for CSL JSON, but that wouldn't satisfy the output requirements of [CSL Style X] as HTML+RDFa structured data readable by Zotero.
[6] http://www.w3.org/TR/json-ld/#the-context
IMO, the only way to accomplish this is to expand a standard like COinS (one that would embed complete metadata) to support a richer set of metadata.
That said, embedding metadata for items cited in a bibliography or in footnotes could be useful in some contexts. As one example, a few of us built a tool last year for U.S. legal texts that implements a similar concept, but based on reverse-parsing plain text citations (possible because U.S. legal citation conventions are more or less consistent, necessary because U.S. legal publishers do not expose structured metadata). The code is used in a plugin for use with Multilingual Zotero, and features in a node.js package for server-side applications.
On the output side, there is a hook in the citeproc-js processor (@bibliography/entry) that can be used to wrap a bibliography entry in arbitrary markup. That doesn't give you element-level granularity for linking, but it could be used (for example) to add a reveal of underlying metadata in an HTML page.
The first step would be to work out a sample page or PDF document that works as you would like. There are some potentially conflicting desiderata -- links to embedded metadata, cross-linking of citations and bibliography entries, external links to full-text source via DOIs or URLs, ORCID links -- and a sample document would force consideration of the design tradeoffs, before looking into how citeproc-js or another CSL processor could be adapted to help make it happen.
If such pages became common (i.e. documents containing both self-referencing and cited metadata), Zotero would need UI for handling extraction and filing of both classes of cite details. There wouldn't be resistance to that, I think, but I doubt it will happen before demand is stimulated by a volume of document data to feed on.
(Edit: A further hurdle to clear would be the mapping of the JSON input to the citeproc-js processor into schema.org [or whatever] structures. The CSL input format serves as an intermediate layer between well-defined formats designed for data exchange between machines, and printed formats designed for human consumption. The CSL input format itself is not designed with data exchange in mind, and you would need to do some work [probably a significant amount of work] on mapping conventions to get things working correctly.)
To step back a bit, there are multiple reasons for including
a bibliography of structured citations:
1. To give credit where credit is due
2. To allow for the verification of logically inductive premises
(to support scientific reproducibility)
The wider objective here, is to share bibliographies of structured citations
as structured data.
Reproducible Science (logically inductive argument verification):
* discover a graph of supporting premises (resources)
* (Zotero helps discover the metadata for one or more resources)
* for each resource
* retrieve peer review comment threads
* retrieve meta-analysis metadata in re: validity and reproducibility
* retrieve supporting data [7]
* validate stated transformations
* validate logical conclusions
* retrieve relevant annotations
* generalize to red/green per resource
The citation lookup overhead seems wastefully inefficient.
How much time is spent, in academia, manually parsing
and disambiguating citations and the resources which they describe?
URIs and URLs are the solution.
The irony here, in respect to citation graph discoverability
and the 6,992 citation styles, is that despite the intricate punctuational
variation from journal to journal, none of the textual citation styles
support looking up the the supporting premises of the supporting premises;
without lots of complex text parsing.
RDF (and RDFa) presents a solution to this;
in regards to the wider problem of verifying cited resources as premises.
* A resource is a Thing.
* For which there can be multiple representations (each with MIME type)
* HTML
* LaTeX
* PDF
* RDF
* RDF/XML
* TriX
* Turtle/N3
* RDFa (HTML + RDF)
* URIs are designed to uniquely identify resources.
* DOI URNs are URIs.
* Most citations do have have a DOI.
* URLs are URIs.
* URLs are designed to be dereferenecable [8]
* Graphs of URLs form the 'Giant Global Graph'
* RDF is designed to describe resource graphs of URIs and URLs
with infinite fidelity
and well-defined parsing semantics.
* Of what use is a citation style without a field for
a URI (e.g. a DOI URN) and/or a URL?
In the western world, we tend to record names as first, middle, and last.
* Bibliographic name granularity (name as FML) is preserved with high fidelity
* With Zotero RDF
* With COinS HTML
* We must parse for URIs and URLs
* With CSL JSON
* Bibliographic name granularity is not preserved
(must parse name fields -> FML)
* With Schema.org RDF (name)
* With DCTERMs RDF (name)
* With almost every CSL (structured data -> text)
* We must parse for URIs and URLs
So, one could run a COinS parser and a (Zotero) RDFa parser
on every resource in a graph of supporting premises.
To promote efficiency:
* A recommendation like
"complete bibliographic data SHOULD be included in a resource"
* Identify loss of fidelity
* Unstructured data -> Structured Data (Zotero RDF) -> CSL
* RDF, RDFa -> RDF (Zotero RDF) -> RDFa
* RDF (Zotero RDF) -> COinS HTML
* Work with COins to produce an RDF schema
* Work with Schema.org (major search engines)
* Understand that western FML name patterns are one way to express names
* ttps://en.wikipedia.org/wiki/Surname
* ttps://en.wikipedia.org/wiki/Unicode_collation_algorithm
[8] https://en.wikipedia.org/wiki/Dereferenceable_Uniform_Resource_Identifier
There do see to be architectural limitations to how CSL JSON (at least in current citeproc-js) is formatted.
Expression of (more complete) bibliographic as structured RDFa data which 'validates' as a particular CSL style could be accomplished through the use of @content. http://www.w3.org/TR/rdfa-core/#object-resolution.
* https://en.wikipedia.org/wiki/Linked_data
* http://www.w3.org/TR/ld-glossary/
* http://5stardata.info/
You're preaching to the converted. People involved in Zotero development see the benefits of linked data, and of passing structured metadata from document to citing document to newly authored document.
It's only a question of implementation, and that's something for document designers to come up with, in the first instance. I think it's fair to say that the Zotero crew are just (reasonably) waiting to see what will emerge at that end.
If you have a concrete example of a published document or sample to show, I'm sure people will be happy to take a look and comment on the possibilities.
Edit: Frank, lol...
In a sense we already have this with DOIs. Unfortunately, the DOI RAs do not always provide metadata, the metadata provided is not always complete, and/or is not presented in a consistent format. Additionally, while in an ideal world any one resource should be described by a single set of metadata (I'm thinking a central curated database. Maybe crowd-sourced?), it seems to me that there will always be a need for customizability.
To this end, we have zotero.org which could (and already does) also serve as a central repository of metadata. The nice part about this is that zotero.org can serve metadata in a number of formats.
So what I imagine is that one could simply add
<link rel="meta" type="application/rdf+xml" href="https://api.zotero.org/groups/183462/items/3DXJRRCD?format=rdf_zotero"/>
to a span encompassing a particular reference and be done with this. This probably isn't that useful to search engines, but, in terms of metadata, I think this is as good as it can get.I wonder whether pandoc's version of citeproc could be extended to automatically spit out COinS or the meta links suggested by aurimas when exporting to HTML; I see that someone suggested this a few years back (where, incidentally, it was also thought that RDFa would in theory be the best mechanism).
The proposal by aurimas for links does seem more sensible in many ways, if it could be implemented; this is a bit like a suggestion by Martin Fenner.
I've shared what I built, along with some documentation here: https://github.com/dylan-k/biblio