ACM Digital Library items incorrectly saved as books

Hello,
I love Zotero - keep up the good work!

Items from the ACM Digital library are correctly identified in the navigation bar
as conference papers, journal articles, etc.
But when I save them, they are identifed as books,
so it is necessary to enter publication name, etc. by hand.
For example: http://portal.acm.org/citation.cfm?id=971405

I looked at the ACM translator (by Simon Kornblith),
which appears to depend on the EndNote/Refer/BibIX (also by Simon).
My guess is that the ACM page has missing or misinterpreted data.
Any suggestions for how I might track this down and fix it?
Thanks!
Clif
  • I've traced the process a bit further.
    The ACM web translator scans the ACM DL page,
    and gets the URL for an EndNote page, which looks like this:

    %0 Conference Paper
    %1 1047351
    %A Joseph Bergin
    %A Clifton Kussmaul
    %A Thomas Reichlmayr
    %A James Caristi
    %A Gary Pollice
    %T Agile development in computer science education: practices and prognosis
    %B Proceedings of the 36th SIGCSE technical symposium on Computer science education
    %@ 1-58113-997-7
    %C St. Louis, Missouri, USA
    %P 130-131
    %D 2005
    %R http://doi.acm.org/10.1145/1047344.1047351
    %I ACM Press

    The ACM translator passes the contents of this page
    to the EndNote/Refer/BibIX import/export translator.
    In Scaffold, this produces the following output:

    10:20:48 ===>ACM (CLK).doWeb()<===(string)
    10:20:49 ===>Returned item:
    'itemType' => "book"
    'creators' ...
    '0' ...
    'firstName' => "Joseph"
    'lastName' => "Bergin"
    'creatorType' => "author"
    '1' ...
    'firstName' => "Clifton"
    'lastName' => "Kussmaul"
    'creatorType' => "author"
    '2' ...
    'firstName' => "Thomas"
    'lastName' => "Reichlmayr"
    'creatorType' => "author"
    '3' ...
    'firstName' => "James"
    'lastName' => "Caristi"
    'creatorType' => "author"
    '4' ...
    'firstName' => "Gary"
    'lastName' => "Pollice"
    'creatorType' => "author"
    'notes' ...
    'tags' ...
    '0' => "
    xp"
    '1' => "
    agility"
    '2' => "
    curriculum"
    '3' => "
    development"
    '4' => "
    methodology"
    '5' => "
    process"
    '6' => "
    software"
    'seeAlso' ...
    'attachments' ...
    '0' ...
    'title' => "ACM Full Text PDF"
    'mimeType' => "application/pdf"
    'url' => "http://portal.acm.org/ft_gateway.cfm?id=1047351&type=pdf&coll=Portal&dl=GUIDE&CFID=23117757&CFTOKEN=95657828"
    '1' ...
    'title' => "ACM Snapshot"
    'mimeType' => "text/html"
    'url' => "http://portal.acm.org/citation.cfm?id=1047351&coll=Portal&dl=GUIDE&CFID=23117757&CFTOKEN=95657828#"
    'title' => "Agile development in computer science education: practices and prognosis "
    'publicationTitle' => "Proceedings of the 36th SIGCSE technical symposium on Computer science education "
    'ISBN' => "1-58113-997-7 "
    'place' => "St. Louis, Missouri, USA "
    'pages' => "130-131 "
    'date' => "2005 "
    'type' => "undefined"
    'publisher' => "ACM Press"
    'complete' => function(...){...}
    'abstractNote' => "Agile approaches to software development share a particular set of values [2,4]:"
    'repository' => "ACM (CLK)"
    <===(string)
    10:20:49 ===>Translation successful<===(string)

    I'm guessing that "type => undefined" causes "itemType => book",
    but I can't figure out why the translator doesn't recognize the type.

    I've tried inserting Zotero.debug() statements in the translators.
    When I put them in the ACM translator, I see them in the output,
    but when I put them in the EndNote translator, I don't see them.
    I've tried to run the EndNote translator on a text file with the EndNote data, but it complains:

    10:32:38 ===>Translation using EndNote/Refer/BibIX (CLK) failed:
    message => this._sandbox.doWeb is not a function
    fileName => chrome://zotero/content/xpcom/translate.js
    lineNumber => 1408
    stack => ()@chrome://zotero/content/xpcom/translate.js:1408
    ()@chrome://zotero/content/xpcom/translate.js:589
    run()@chrome://scaffold/content/scaffold.js:154
    oncommand([object XULCommandEvent])@chrome://scaffold/content/scaffold.xul:1
    @:0

    name => TypeError
    url => file:///C:/Documents%20and%20Settings/Clif%20Kussmaul/My%20Documents/Mberg%20SVN/EndNote.txt
    extensions.zotero.cacheTranslatorData => true
    extensions.zotero.downloadAssociatedFiles => true<===(string)
  • I got it!

    I noticed that in the EndNote page, some of the lines have trailing spaces,
    including the line with the item type. That seemed suspicious...

    In doImport(), I changed the line before the main while loop from:
    var data = line.substr(3);
    to
    var data = line.substr(3).replace(/^\s*/, '').replace(/\s*$/, '');
    which removes leading and trailing whitespace from the data value.

    This issue might be broader than just the ACM translator.
    Please consider adding a trim() function to the Zotero library :-)

    Clif
  • Besides the issue of trailing whitespace, I think that, for the Endnote tagged text format, "Conference Paper" is an incorrect resource type name. "Conference Proceedings" should be used instead.

    http://www.ecst.csuchico.edu/~jacobsd/bib/formats/endnote.html
  • Thank you for the detective work. We do have a trim function, Zotero.Utilities.superCleanString(), and we'll apply it to the appropriate import translator. Thanks again.
  • Have these issues been corrected in the Zotero translators?
    If not, when might it happen?

    I've been using my own, modified copies of Simon's translators,
    (by setting priority < 100)
    but I'd rather use the main branch so I get other updates.
    When I disable my versions (by setting priority > 100),
    I still get the incorrect results described above,
    and I don't see superCleanString() called in
    the ACM or EndNote translators.

    My Zotero is configured to check for updated scrapers,
    and I click "Update Now" periodically just to be sure.
    Am I missing something?

    Thanks,
    Clif
  • I recently downloaded Zotero 1.0 rc3, after seeing it posted on the Planet Mozilla blog -- cool stuff! (I'd been starting to write my own Fx extension to do something similar, but this one's far further along...)

    I can confirm kussmaul's problem as still being present in the 1.0rc3 build, updated as of today...

    Is there any reason not to use the BibTeX link instead of the EndNote one? I've written a perl script that does this -- it scrapes the bibtex popup, inserts the abstract and the DOI link from the main page, and returns the resulting BibTeX string to the user (in my case, an output .bib file). It'd be easy to translate to JScript, if that winds up being more robust a solution than translating from the EndNote.
  • edited September 5, 2007
    Yep, I have the same bug for ACM (articles saved as books) and new one.
    SpringerLink Lecture Notes in Computer Science chapters saved as web pages (without at least any info about pages) instead of BookSection.
    E.g. http://www.springerlink.com/content/v974512qnq683757/?p=f4f58295a3dc4f24a9a3728a8473939d&pi=1
  • Yes, I'm also annoyed by this... I hope that with more and more computer scientists starting to use this brilliant extension, there will be someone who has the time to fix the ACM and Springer scrapers :)
  • I looked at http://www.springerlink.com/content/v974512qnq683757/?p=f4f58295a3dc4f24a9a3728a8473939d&pi=1

    The problem is with SpringerLink, not Zotero. They're not following the RIS specification (using a type of "CHAPTER" rather than "CHAP"). We could probably work around this problem, but it would be much better for us and everyone else if they followed the RIS format convention.
  • The ACM problem has recently been fixed. Please update your translators.
  • Works fine for me. Thanks!
This discussion has been closed.