Scaffold 2.0 fails to run examples from Crymble ch11?

summary: I get null results when executing the examples from http://niche-canada.org/member-projects/zotero-guide/chapter11.html with Scaffold 2.0, though the examples from previous pages work. How to fix, debug, report bug, etc?

details:

I've been using Zotero for over a year, but have not previously documented or developed for it. I'm currently using Zotero 2.0.3 on Firefox 3.5.9 on Ubuntu 9.10. I recently tried to pull in an article from a journal for which there is no translator. After some search, I found Adam Crymble's "How to Write a Zotero Translator"

http://niche-canada.org/member-projects/zotero-guide/chapter1.html

(aka HWZT) to be generally acclaimed the best guide to writing a simple screenscraping translator ... except that it hasn't been maintained, and the tools it references are either downlevel (Scaffold 1.0) or orphaned (Solvent 2.0). So I've started a wiki page

http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus

(aka HWZT++) that aspires to update and wikify HWZT. For the moment, it is merely a list of deltas to HWZT, organized by HWZT page: crude, but quick and avoids rights issues. The main differences are

* Scaffold 1.0 -> Scaffold 2.0
* Solvent -> DOM Inspector + XPather
* saving the first sample page locally, and working on the local copy

(see HWZT++ for justifications). This has worked well for the first 10 HWZT chapters/webpages: i.e. for that material I can

* browse the local sample page with DOM Inspector + XPather
* execute HWZT's Javascript examples in Scaffold 2.0

However execution of the examples in

http://niche-canada.org/member-projects/zotero-guide/chapter11.html

give null output. I.e. if I

0 in uplevel Firefox, install add-ons

* Scaffold 2.0 from
http://bitbucket.org/rmzelle/scaffold/downloads

* latest DOM Inspector and XPather from
https://addons.mozilla.org/en-US/firefox/

and restart.

1 save http://niche-canada.org/member-projects/zotero-guide/sample1.html to /tmp/HWZTsamples/sample1.html

2 open file:///tmp/HWZTsamples/sample1.html in FF

3 open Scaffold 2.0 (Tools>Scaffold) on file:///tmp/HWZTsamples/sample1.html

4 in tab=Metadata, set
Label=foo
Creator=bar
Target=file:///tmp/HWZTsamples/
and hit button="Test Regex", I get result=

> 18:06:54 ===>true<===(boolean)

5 switch to tab=Code and enter in the input text/frame (on the left) this code (goto

http://niche-canada.org/member-projects/zotero-guide/chapter11.html

and search on text="Example 11.5", and combine that code with Example 11.6)

// start code
function detectWeb(doc, url) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == "x" ) return namespace; else return null;
} : null;
var myXPath = '//td[1]';
var myXPathObject = doc.evaluate(myXPath, doc, nsResolver, XPathResult.ANY_TYPE, null); }
Zotero.debug(myXPathObject);
// stop code

then hit icon=thunderbolt

Expected result: text in the output text/frame (on the right) indicating

http://niche-canada.org/member-projects/zotero-guide/chapter11.html
> myXPathObject is now equivalent to a Simple Variable holding, in this case, "Title: "

Observed result: nothing.

What must I do to make this work? Alternatively,

* should I be discussing/reporting this in another venue?
* if a bug, what should I do to report it?
  • Tom,

    There is a discussion here of a new helper framework for building translators, built by Erik Hetzner:

    http://groups.google.com/group/zotero-dev/browse_thread/thread/2da920ae70b2ddf7

    When the framework is incorporated into Zotero, a streamlined syntax will be available for extracting content using xpath statements. Certainly one to watch.
  • Yes, http://e6h.org/~egh/hg/zotero-transfw looks interesting. However it doesn't solve my Scaffold problem :-)
  • The linked sample page comes up in FF 3.6.7 under Linux (jaunty) with extraneous non-ascii characters between every content character.

    http://niche-canada.org/member-projects/zotero-guide/sample1.html

    It looks as though something is awry with the content of the page?
  • I had a look at the problem and I've fixed the 3 sample pages that had been corrupted. Sorry for that, I'm not sure what caused it.

    I hope that fixes your problem with chapter 11.

    Adam Crymble
  • edited July 4, 2010
    Acrymble 4 Jul 10
    > I've fixed the 3 sample pages that had been corrupted.

    Thanks, I've removed the sample-page-related workarounds from HWZT++.

    > I hope that fixes your problem with chapter 11.

    Unfortunately it does not. The usecase is now

    0 in uplevel Firefox, install add-ons

    * Scaffold 2.0 from
    http://bitbucket.org/rmzelle/scaffold/downloads

    * latest DOM Inspector and XPather from
    https://addons.mozilla.org/en-US/firefox/

    and restart.

    1 Open http://niche-canada.org/member-projects/zotero-guide/sample1.html in FF

    2 Open Scaffold 2.0 (Tools>Scaffold) with that sample page open and focused.

    3 in tab=Metadata, set
    Label=foo
    Creator=bar
    Target=http://niche-canada.org/member-projects/zotero-guide/

    and hit button="Test Regex". Expected and actual results similar to

    > 18:06:54 ===>true<===(boolean)

    4 Switch to tab=Code and enter in the input text/frame (on the left) this code from Example 11.5 with a bit appended from Example 11.6 in http://niche-canada.org/member-projects/zotero-guide/chapter11.html

    // start code
    function detectWeb(doc, url) {
    var namespace = doc.documentElement.namespaceURI;
    var nsResolver = namespace ? function(prefix) {
    if (prefix == "x" ) return namespace; else return null;
    } : null;
    var myXPath = '//td[1]';
    var myXPathObject = doc.evaluate(myXPath, doc, nsResolver, XPathResult.ANY_TYPE, null);
    }
    Zotero.debug(myXPathObject);
    // stop code

    then hit icon="Run doWeb" (the stylized thunderbolt). Expected result: text in the output text/frame (on the right) similar to

    http://niche-canada.org/member-projects/zotero-guide/chapter11.html
    > myXPathObject is now equivalent to a Simple Variable holding, in this case, "Title: "

    Observed result: nothing.

    Note that hitting icon="Run detectWeb" (the eye next to the thunderbolt) also produces no output.
  • Problem solved @
    http://bitbucket.org/rmzelle/scaffold/issue/8/exemplar-from-hwzt-ch11-produces-no-output#comment-207513
    The code to use is

    function detectWeb(doc, url) {
    var namespace = doc.documentElement.namespaceURI;
    var nsResolver = namespace ? function(prefix) {
    if (prefix == "x" ) return namespace; else return null;
    } : null;
    var myXPath = '//td[1]';
    var myXPathObject =
    doc.evaluate(myXPath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
    Zotero.debug(myXPathObject);
    }

    which on clicking the icon="Run detectWeb" (eye) produces output like

    11:51:57 Title:
Sign In or Register to comment.