How to write a Zotero 2.x translator?

After receiving underwhelming response to an earlier post about converting HTML to Zotero, I decided to investigate writing my own translator (screen scraper).

Adam Crymble's How to Write a Zotero Translator is an excellent tutorial, but unfortunately uses a bunch of tools that only work with Zotero 1.x (Zotero's own Scaffold) or Firefox 2.x (MIT's Solvent 2.0).

So then I tried going straight to the Zotero documentation. Unfortunately, the page's subtitle, "Zotero Translators -- The Missing Manual," is too apropos. For example, clicking on the link to the Zotero.Utilities documentation only leads to a page reporting a Trac error, as do the links to documentation for the specific functions mentioned on the page.

I'm willing to roll my own, but this is ridiculous. I realize Zotero is free, so we have no innate right to demand anything. But it would be better to say, "Documentation on how to write translators for Zotero 2.x does not exist at this time" than to waste everyone's time with a wild goose chase. Even better, it would be useful to add a brief overview of how to write translators for 2.x and get rid of the broken links on the current instruction page.

Best of all would be direction to up-to-date documentation. Can anyone help with this?

Thanks.
  • edited June 8, 2010
    marsh: The move to 2.0 involved changes that broke scaffold. Several people [1] have put in significant time and effort on recoding the utility and, more recently, putting forward a patch to the Zotero development trunk in order to get it working properly again. Even more recently (just three days ago in fact), those changes were merged into the trunk. So, well, yes, the documentation on writing translators could use some attention, now that we have something to document. [2]

    Would you like to help out?

    [1] Most of the work on the refashioned scaffold was done by Rintze Zelle.
    [2] Should also mention that Avram Lyon has done a great deal of work on translators recently, and would I'm sure be very happy to see others pitch in to spruce up the documentation.
  • see also http://groups.google.com/group/zotero-dev/browse_thread/thread/88400b0d67b40580/0bf6d45e5c8263cf?#0bf6d45e5c8263cf

    The best practices and usage of the new functions (as well as the required translator functions and what they do as described in Adam Crymble's tutorial) should be added to the documentation.

    I think a short preamble to the tutorial noting the small changes in practical workflow/tools and filling out of the Missing Manual purely of reference information for Zotero internal functions would go a long way.
  • fbennett wrote:

    marsh: The move to 2.0 involved changes that broke scaffold. Several people [1] have put in significant time and effort on recoding the utility and, more recently, putting forward a patch to the Zotero development trunk in order to get it working properly again. Even more recently (just three days ago in fact), those changes were merged into the trunk. So, well, yes, the documentation on writing translators could use some attention, now that we have something to document. [2]

    Would you like to help out?

    [1] Most of the work on the refashioned scaffold was done by Rintze Zelle.
    [2] Should also mention that Avram Lyon has done a great deal of work on translators recently, and would I'm sure be very happy to see others pitch in to spruce up the documentation.
    Yes, I would like to help out, but I'm not sure how much help I could be since the reason I was looking at the documentation was to start to learn about translation. I could do two things:

    1) Fix the broken links in the current page, either to point to substitute pages or making them dead links and saying so.
    2) I woke up from my stupor and remembered that Refer & BibTeX are supported import formats. Since these are well documented, and I've used them before, I didn't need to learn mysterious (to me) XML or RDF codes. I could write a tutorial on how to convert a typical HTML page to BibTeX and import it into Zotero. This is not quite automatic screen scraping, but with creative use of regular expressions it beats cutting and pasting a whole slew of chapters one-by-one.

    Marsh
  • Those functions are in this file:
    https://www.zotero.org/svn/extension/trunk/chrome/content/zotero/xpcom/utilities.js
    But mostly what you need is the guide by adam crymble (the last couple of chapters that show the actual translator, plus whatever background knowledge you don't have from early chapters) and to note that the recommended functions to request pages have changed as announced in
    http://groups.google.com/group/zotero-dev/browse_thread/thread/88400b0d67b40580/0bf6d45e5c8263cf?#0bf6d45e5c8263cf

    That allows you to build a basic scraping translator.

    Once you get past that probably the best way to learn is to ask about specifics on the dev list and look at other translators.
  • edited June 11, 2010
    You certainly don't need to, nor should you, convert things to BibTeX in order to import. Please take a look at some of the simpler translators to get an idea of how they work. If you are familiar with regular expressions and the bare bones of programming, I think you'll find that they are easily emulated.

    For example, take a look at the Chronicle of Higher Education translator -- it's pretty short, and it doesn't use any of the special import formats (BibTeX, RIS). Make a copy of it, change the UUID at the top of the file, change the target URL to the URL of the site you're scraping, restart Firefox, fire up the debug output in Firefox, and start working. Just remember to refresh the target page each time you make a change to the translator file (and restart Firefox each time you change the target URL regex). Keep debug output open and refresh it with F5 each time you retry. Use Zotero.debug() freely and start scraping. You can use regexes, but XPath expressions are more reliable-- just look at the web site's source and tweak the existing translator source code accordingly.

    And there's an unofficial XPI for Scaffold 2.0 on my website -- [snip --al]

    [Edit: Is there a supported version of the 2.0 XPI for Scaffold anywhere? I'd rather not point people to a 2nd-party site, even my own, to install extensions.]

    [Edit again: Until an official release comes from Zotero.org, install from the Scaffold 2.0 BitBucket repository: http://bitbucket.org/rmzelle/scaffold/downloads/scaffold2.0-20100606.xpi ]
  • Is there a supported version of the 2.0 XPI for Scaffold anywhere? I'd rather not point people to a 2nd-party site, even my own, to install extensions.
    We're planning to merge Scaffold 2.0 back into our SVN, at which point we'll release the new XPI. If someone wants to help with that, that'd be great. (We can give commit access as necessary.) Otherwise we'll try to get to it soon.
  • edited June 11, 2010
    For now, Scaffold 2.0 can be installed from http://bitbucket.org/rmzelle/scaffold/downloads . However, as mentioned on the bitbucket wiki, error reporting and indenting of the JSON header of translators only work with the Zotero trunk. Once Scaffold 2.0 has been merged back, and Zotero 2.0.4 or 2.1 released, I'll help with updating the Zotero wiki on the subject of translators.
  • > Adam Crymble's How to Write a Zotero Translator is an excellent
    > tutorial, but unfortunately uses a bunch of tools that only work
    > with Zotero 1.x (Zotero's own Scaffold) or Firefox 2.x (MIT's
    > Solvent 2.0).

    I'm beginning a crude wikification/update of Crymble @ http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus using uplevel tools (Scaffold 2.0, XPather, DOM Inspector). Assistance is appreciated. I also took the liberty of linking to Scaffold 2.0 from http://www.zotero.org/support/dev/creating_translators_for_sites
  • Great. FYI, I just asked Adam Crymble if he would be interested in helping out, or at least in donating his writing to the cause (the only licensing information I could find is a copyright notice at the bottom of the guide pages, so you might want to wait with copying his guide until he acknowledges that's fine with him; I'll post here if and/or when he does).
  • See http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus : it's currently just a series of deltas, i.e. crude, no copying, no copyright infringement.
  • As I mentioned to Rintze, if you'd like to update the content of the original "How to Write a Zotero Translator" as a wiki, I'm happy to release the copyright under the following conditions:

    It be released under the name: "How to Write a Zotero Translator, 2nd Edition." And a permanent attribution: "Adapted from 'How to Write a Zotero Translator' (2009) by Adam Crymble" would be clearly visible and the venture should be not-for-profit.

    Otherwise, I encourage this effort that seeks to make the guide and Zotero more useable.

    Adam
  • Feel free to see the current intro to HWZT++ @

    http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus#chapter_0introduction

    and the blurbs for HWZT and HWZT++ @

    http://www.zotero.org/support/dev/creating_translators_for_sites

    Regarding not/profit, I believe all the wiki content is Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported

    http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en

    HTH, Tom Roche <Tom_Roche@pobox.com>
  • For illustration/proof of concept, I just now forklifted HWZT ch1 into

    http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus#chapter_1hwzt_intro

    FWIW, Tom Roche <Tom_Roche@pobox.com>
Sign In or Register to comment.