How to write a Zotero 2.x translator?
After receiving underwhelming response to an earlier post about converting HTML to Zotero, I decided to investigate writing my own translator (screen scraper).
Adam Crymble's How to Write a Zotero Translator is an excellent tutorial, but unfortunately uses a bunch of tools that only work with Zotero 1.x (Zotero's own Scaffold) or Firefox 2.x (MIT's Solvent 2.0).
So then I tried going straight to the Zotero documentation. Unfortunately, the page's subtitle, "Zotero Translators -- The Missing Manual," is too apropos. For example, clicking on the link to the Zotero.Utilities documentation only leads to a page reporting a Trac error, as do the links to documentation for the specific functions mentioned on the page.
I'm willing to roll my own, but this is ridiculous. I realize Zotero is free, so we have no innate right to demand anything. But it would be better to say, "Documentation on how to write translators for Zotero 2.x does not exist at this time" than to waste everyone's time with a wild goose chase. Even better, it would be useful to add a brief overview of how to write translators for 2.x and get rid of the broken links on the current instruction page.
Best of all would be direction to up-to-date documentation. Can anyone help with this?
Thanks.
Adam Crymble's How to Write a Zotero Translator is an excellent tutorial, but unfortunately uses a bunch of tools that only work with Zotero 1.x (Zotero's own Scaffold) or Firefox 2.x (MIT's Solvent 2.0).
So then I tried going straight to the Zotero documentation. Unfortunately, the page's subtitle, "Zotero Translators -- The Missing Manual," is too apropos. For example, clicking on the link to the Zotero.Utilities documentation only leads to a page reporting a Trac error, as do the links to documentation for the specific functions mentioned on the page.
I'm willing to roll my own, but this is ridiculous. I realize Zotero is free, so we have no innate right to demand anything. But it would be better to say, "Documentation on how to write translators for Zotero 2.x does not exist at this time" than to waste everyone's time with a wild goose chase. Even better, it would be useful to add a brief overview of how to write translators for 2.x and get rid of the broken links on the current instruction page.
Best of all would be direction to up-to-date documentation. Can anyone help with this?
Thanks.
Would you like to help out?
[1] Most of the work on the refashioned scaffold was done by Rintze Zelle.
[2] Should also mention that Avram Lyon has done a great deal of work on translators recently, and would I'm sure be very happy to see others pitch in to spruce up the documentation.
The best practices and usage of the new functions (as well as the required translator functions and what they do as described in Adam Crymble's tutorial) should be added to the documentation.
I think a short preamble to the tutorial noting the small changes in practical workflow/tools and filling out of the Missing Manual purely of reference information for Zotero internal functions would go a long way.
1) Fix the broken links in the current page, either to point to substitute pages or making them dead links and saying so.
2) I woke up from my stupor and remembered that Refer & BibTeX are supported import formats. Since these are well documented, and I've used them before, I didn't need to learn mysterious (to me) XML or RDF codes. I could write a tutorial on how to convert a typical HTML page to BibTeX and import it into Zotero. This is not quite automatic screen scraping, but with creative use of regular expressions it beats cutting and pasting a whole slew of chapters one-by-one.
Marsh
https://www.zotero.org/svn/extension/trunk/chrome/content/zotero/xpcom/utilities.js
But mostly what you need is the guide by adam crymble (the last couple of chapters that show the actual translator, plus whatever background knowledge you don't have from early chapters) and to note that the recommended functions to request pages have changed as announced in
http://groups.google.com/group/zotero-dev/browse_thread/thread/88400b0d67b40580/0bf6d45e5c8263cf?#0bf6d45e5c8263cf
That allows you to build a basic scraping translator.
Once you get past that probably the best way to learn is to ask about specifics on the dev list and look at other translators.
For example, take a look at the Chronicle of Higher Education translator -- it's pretty short, and it doesn't use any of the special import formats (BibTeX, RIS). Make a copy of it, change the UUID at the top of the file, change the target URL to the URL of the site you're scraping, restart Firefox, fire up the debug output in Firefox, and start working. Just remember to refresh the target page each time you make a change to the translator file (and restart Firefox each time you change the target URL regex). Keep debug output open and refresh it with F5 each time you retry. Use Zotero.debug() freely and start scraping. You can use regexes, but XPath expressions are more reliable-- just look at the web site's source and tweak the existing translator source code accordingly.
And there's an unofficial XPI for Scaffold 2.0 on my website -- [snip --al]
[Edit: Is there a supported version of the 2.0 XPI for Scaffold anywhere? I'd rather not point people to a 2nd-party site, even my own, to install extensions.]
[Edit again: Until an official release comes from Zotero.org, install from the Scaffold 2.0 BitBucket repository: http://bitbucket.org/rmzelle/scaffold/downloads/scaffold2.0-20100606.xpi ]
> tutorial, but unfortunately uses a bunch of tools that only work
> with Zotero 1.x (Zotero's own Scaffold) or Firefox 2.x (MIT's
> Solvent 2.0).
I'm beginning a crude wikification/update of Crymble @ http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus using uplevel tools (Scaffold 2.0, XPather, DOM Inspector). Assistance is appreciated. I also took the liberty of linking to Scaffold 2.0 from http://www.zotero.org/support/dev/creating_translators_for_sites
It be released under the name: "How to Write a Zotero Translator, 2nd Edition." And a permanent attribution: "Adapted from 'How to Write a Zotero Translator' (2009) by Adam Crymble" would be clearly visible and the venture should be not-for-profit.
Otherwise, I encourage this effort that seeks to make the guide and Zotero more useable.
Adam
http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus#chapter_0introduction
and the blurbs for HWZT and HWZT++ @
http://www.zotero.org/support/dev/creating_translators_for_sites
Regarding not/profit, I believe all the wiki content is Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported
http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en
HTH, Tom Roche <Tom_Roche@pobox.com>
http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus#chapter_1hwzt_intro
FWIW, Tom Roche <Tom_Roche@pobox.com>