How to make Zotero friendly websites?

schnippy · November 12, 2006

I'm working on a large research database website and I can see the benefit of making it easy for users to extract citations and export them to other formats. Are there any pointers for content developers who want to help expand the number of zotero-friendly websites? What format should the data be in?

Thanks,

- s

noksagt · November 12, 2006

An easy step which would allow you to work with Zotero and some other apps would be to embed COinS:
<http://ocoins.info/>;
Zotero also reads embedded RDF, so that is a possible alternative. Zotero also has plans to eventually implement UnAPI support. If you fail to implement any of these embedded formats, at least allow RIS export (also useful for Endnote)--Zotero can catch those if users manually export them from your site.

refbase is a GPLed MySQL/PHP project which has COinS and UnAPI support:
<http://refbase.sourceforge.net/>;

tdowling · November 15, 2006

COinS cannot include abstracts, keywords, or subject headings, and I haven't found a way to get Zotero to recognize multiple author names. That's just part of using OpenURL.

I also can't get it to respond to DC.Type tags to recognize a page as anything other than a general text document.

So, is there a good example of Zotero-friendly embedded RDF people can copy into their own pages?

noksagt · November 15, 2006

Multiple authors in COinS can be added by using multiple 'au=' for second through last author (as opposed to 'aulast' and 'aufirst' for the first author). Again, see how refbase does it:
http://arc.nucapt.northwestern.edu/refbase/show.php?author=seidman

COinS is currently embedded in MANY more sites and used by many more applications than embedded RDF, so it is a good idea to support it anyway.

You are right that the absence of abstracts and keywords is a major limitation of COinS. And you are also right that the COinS parser for Zotero could still be improved.

Embedded RDF is described and sampled on http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml

noksagt · November 15, 2006

And here's the schema: <http://purl.org/net/biblio>;

tdowling · November 16, 2006

Thanks for the pointers.

ts · November 16, 2006

For the embedded formats, do any provide a way to specify that an attachment goes with a bibliographic entry?

For example, if I have a page of pdf's of papers and the citations that go with them. I'd like Zotero users to be able to grab what they want, like the JSTOR scraper that grabs the papers as attachments with the bibliographic entry.

Matthias · November 18, 2006

unAPI support in Zotero would allow Zotero users to automatically grab PDFs that are associated with any record displayed on an unAPI-enabled site.

http://unapi.info/

Since the Zotero folks seem to be committed to supporting open standards, it is my hope that we'll eventually see unAPI support in Zotero. This would allow Zotero to grab any metadata, bibliographic format (Endnote, RIS, BibTeX, RDF, etc) and/or (PDF) file that's associated with a record displayed on an unAPI-enabled website. Using unAPI, Zotero would also be able to grab any given abstracts and keywords.

For anyone with a little web programming skills, implementation of the unAPI service is rather straightforward. The unAPI spec and help notes do all fit on a single page:

http://unapi.info/specs/

For people interested in implementing an unAPI service for their own site, I've written some info about an existing unAPI implementation which also includes some usage examples:

http://unapi.refbase.net

If many sites would support standard retrieval mechanisms such as COinS, embedded RDF and/or unAPI, this would also significantly reduce the need for the Zotero guys to develop site-specific scrapers (which will surely break at some point in time and will thus need permanent maintenance).

noksagt · November 18, 2006

I do have a question for the Zotero devs: What is the planned behavior for sites that support multiple embedded standards? Will we get every record listed several times (once for each way they are embedded)? Will they be prioritized somehow and, if so, what will that priority be? Or will there be some intelligent system that tries to extract information from all the embedded formats & combine it (and, if this is the case, how will conflicting information be handled)?

kkraus · November 21, 2006

noksagt, I'm re-posting your question to the development forum.

jim · November 26, 2007

Hi,

I too am interested in making our pages easily digestible for Zotero.

I notice from the "compatible standards and software" page (http://www.zotero.org/documentation/compatible_standards_and_software) that Zotero supports Dublin Core.

Does this mean that making a web page Zotero-compatible is just a case of adding "DC.Whatever" meta tags to the page?

(If this is the case it seems preferable to using the COinS system - I think additional meta tags should be relatively easy to automatically drop into page headers, but I'm sure I'm missing something as it reads as if the COinS method is generally preferred.)

Thanks,
Jim

jim · November 30, 2007

Hi,

Apologies if it's considered impolite to reply to your own messages on this forum, but I went ahead and tried my suggestion about adding some DC metadata to a webpage and Zotero seems to be able to grab it just fine.

I used the following subset of the Dublin Core which seems to work well:

<meta name="DC.Title" content="A webpage by Jim">
<meta name="DC.Creator" content="Jim">
<meta name="DC.Subject" content="stuff, not much">
<meta name="DC.Description" content="Just a bit of a test to see how Zotero grabs DC metadata">
<meta name="DC.Publisher" content="Jim">
<meta name="DC.Type" content="Text">
<meta name="DC.Format" content="text/html">
<meta name="DC.Language" content="en-GB">
<meta name="DC.Rights" content="Copyright Jim">

I think Zotero put the description metadata in its "Extra" field rather than "Abstract where I guess it should be, but other than that everything looked properly aligned.

I hope that's useful.

Regards,
Jim

Tjowens · November 30, 2007

Thanks Jim,

It is by no means impolite to answer your own question, it adds to the collective wisdom here in these forums. I meant to comment before. If you want to add metadata to individual pages Dublin Core is in fact a great way to get the basics in. But keep in mind that Zotero will only grab one DC item from a page.

COinS works well if you want to add metadata for a list of items. So if you wanted to allow someone to capture a entire bibliography or several search results COinS is probably a better way to go.

mr.frazzlebottom · December 17, 2007

How about, META name="description", META name="keywords", etc. support?

I ask because the "Create New Item from Current Page" feature ignores them -- and what if, as in many pages, no other format of meta data exists?

The META HTML tag is probably considered deprecated or something, but it would (or should be) trivial to add some code to support them if they exist and no other similar meta tags exist.

As it is, for many pages I will be adding to Zotero, I will have to manually add information that Zotero should be adding itself.

dstillman · December 18, 2007

How about, META name="description", META name="keywords", etc. support?

We could import them, but the quality varies widely. Author is probably the most reliable. Description could become Abstract or Extra, but it's quite unpredictable and often not particularly useful. Keywords could be parsed as tags, but they're often designed for (circa-1998) search engines and rather redundant.

On the other hand, I suppose it's easier to ignore/remove data than add it manually...

yjchen · December 21, 2007

I wrote a post about how to support Zotero with unAPI. It uses Rails, but is easily applied to other framework. Here is the link: http://blog.reciprocallattice.com/2007/12/add-unapi-support-in-rails-application.html

It is really easy in my own opinions. Just need to reply three requests from Zotero.

karindalziel · January 30, 2008

tdowling above said:

"I also can't get it to respond to DC.Type tags to recognize a page as anything other than a general text document."

Has anyone figured this out? I'd like to eventually get people to use COinS, but in the short term I think I'll have better luck with Dublin Core. But, no matter what I do, I can't get the type detected.

sean · January 30, 2008

Have you tried setting DC.Type to a Zotero item type? For example, book, journalArticle, bookSection, etc?

merz1 · August 17, 2009

Beside checking out the Dublin Core options for the header of my website I want to throw in the growing use of the hAtom microformat.

hAtom is easy to parse when storing a snapshot and offers all basic informations for an article list or an individual article.

Link to draft specification: http://microformats.org/wiki/hatom

KlausR · March 20, 2010

Dear all,

Adding to the above, could someone clarify how I could override multiple descriptors in favour of COINS e.g. DC decriptors in the META HEAD with CoiNS from the content of a page.
Concretely this means:
On http://www.londonmobilelearning.net/#outputs.php?state=0 I have a nice bibliography generated with Zotero including CoiNS which I want to prefer and display in the address bar. In the META of the Website there are DC descriptors.
Now I want my Zotero to override the DC decriptors to recognise the COINS in order to display the proper icon and information in the address bar.
See for the difference the print URL without DC descriptors at http://www.londonmobilelearning.net/print.php?printurl=inc_books_issues.html

Thanks for your help
Klaus

ajlyon · March 22, 2010

[edit] The priorities listed here are wrong. See the correct number below. --AL

In the short term, you can write a site-specific translator that calls the COinS translator explicitly. It will take priority over the RDF translator.

More generally, I would like to see the embedded metadata reordered. Right now, the priorities are:
RDF: 100
unAPI: 200
COinS: 300
This makes some sense, since RDF and unAPI are more expressive than COinS. That said, I think that when both RDF and unAPI or RDF and COinS are present on a single page, then we can likely assume that unAPI or COinS are likely to carrying the more relevant data. This is exacerbated by, as Klaus notes, the growing use of DC descriptors.
I'd like to see:
unAPI: 100
COinS: 200
RDF: 300

Ultimately, there should be some reworking of the translators system so that a document can be combine formats-- an annotated bibliography might be well-described via RDF, but its entries could use COinS. Zotero would then support saving both the bibliography and its constituent entries.

KlausR · March 24, 2010

Thanks very much.
It turned out that it probably is the only way to create a site specific translator. I tried to remove most of the rdf data, but just the slightest trace of rdf prevents ZOTERO from recognising COINS

But still: Rock on and thumbs up for ZOTERO !

KlausR · March 26, 2010

OK,
I continued playing around: I created a site specific translator for my site on the basis of the COINS translator, but RDF was still preferred over COINS.
Then I removed really all RDF data, keeping the site specific translator, but now DOI was preferred over COINS.
Now I guess that either my translator doesn't work or this is a real bug in Zotero.
Anyway I'll now post a feature request that asks to be able to put some code into the HEAD tag that tells Zotero ultimately which descriptor or translator to use. That can't be too hard.

Best, Klaus

ajlyon · March 26, 2010

If you have correctly created a site-specific translator, the DOI, RDF, and COinS translators shouldn't be triggered at all. I usually add some Zotero.debug(..) calls and watch debug output when developing translators -- try adding some to make sure that your translator is in fact being called.

There really ought to be a way to specify what metadata to look at, or to request that all the metadata is looked at. Maybe it would be reasonable to run DOI, unAPI, COinS and RDF on all pages? Or perhaps a single translator could be developed that tried to do all four in a single pass through the document?

KlausR · March 26, 2010

I tend to see for the feature request at: http://forums.zotero.org/discussion/11990/tell-zotero-which-type-of-data-is-on-my-website

I don't really see the point in creating a site-specific translator for my site as I do have valid and rich data in it already, and I guess many others have the same problem.

BTW, collecting all available data is a clever idea.

dstillman · March 26, 2010

KlausR: This is not a general problem with different metadata types—it's an issue with the way your page is being built (i.e., via JavaScript after page load). See my response on the other thread.

dstillman · March 26, 2010

Right now, the priorities are:
RDF: 100
unAPI: 200
COinS: 300

No, the priorities are:

unAPI: 200
COinS: 300
Embedded RDF: 400

ajlyon · March 26, 2010

I rearranged my translator priorities to be:
unAPI 100
COinS 200
RDF 300
DOI 400
This works for me, except that you apparently stripped your site of COinS, so your site still shows only RDF.

Until a combined embedded metadata translator arrives (and I think that's the only way to go long-term), such a rearranged set of priorities would cover common cases like yours.

Edit: Dan is right... I was confusing "RDF.js" with "Embedded RDF.js"

Rintze · January 27, 2011

@Dan Stillman: the DOI and COinS translators both have a priority of 300. Can we reassign the priority numbers of these generic translators so that the priority order is made clear? I discussed this with ajlyon, and he thinks that the DOI translator typically gives better results than the Embedded RDF one, so perhaps we could use:

unAPI: 200
COinS: 300
DOI: 400
Embedded RDF: 500

ajlyon · January 27, 2011

My opinion is split-- good Embedded RDF might well be superb data. Most embedded RDF is a hodge-podge of very simple Dublin Core terms that people only rarely are looking for. It would be bad if an RDF-ful site was being read by DOI instead.

Again, we (I) really need to make a single translator that combines the embedded data scrapes to do them all in one pass.

Rintze · January 27, 2011

@ajlyon: agreed, but changing the priorities would clear things up in the meantime.