ACM Conference papers imported as Journal papers

Hi,

I've noticed since upgrading to 2.1 that some of the articles I import from the ACM digital library get imported as journal papers, when they are actually conference papers.

e.g. <http://dx.doi.org/10.1145/302979.303166>;

I'm using Zotero 2.1.1 and Firefox 4.0 on OSX.

I'm reasonably technically proficient, so I could probably fix this if:
a) it's actually a problem (i.e. the intended behavior is not to import this as a journal article)
b) someone can give me a clue for where to start digging ('translators' folder in zotero? - but how to debug?)
c) presuming there is a fix, where is the place for me to contribute it back?

Best Regards,
Jared Donovan.
  • BTW - I saw in some other threads that there has been some problems with the DOI translator and some ACM imports - but I checked which translator was being used in my case by hovering on the address field icon and could see that it was 'ACM Digtal Library'.
  • a) yes it's actually a problem - we want data as precise as possible. That said, you'd have to check if the data is actually there - i.e. if data that ACM passes on to Zotero contains that information. If not it's game over.

    b) Translator documentation isn't great, what we've got is here:
    http://www.zotero.org/support/dev/creating_translators_for_sites
    There's an ongoing push for better dev docs.
    Translators are .js files in the translator folder in the Zotero folder.

    c) pst the fixed ACM.js translator to gist.github.com and either post here or post to the zotero-dev group: http://groups.google.com/group/zotero-dev
  • Thanks for the quick reply. I've started by taking a look at the translator through the scaffold addon. I'll just dump some notes on my progress here. Please don't take this as an obligation to reply.

    I've run the detectWeb and doWeb commands. Output is:

    ===detectWeb===

    11:24:55 Title: Cooperative inquiry: developing new technologies for children with children
    11:24:55 Single item detected
    11:24:55 Unable to retrieve text for XPath: //td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]
    11:24:55 Type:
    11:24:55 detectWeb returned type "journalArticle"

    ...so there's some XPath problem, and the type is already being determined as 'journalArticle' here. I guess this is the code that gets run by zotero to see if there's anything to import on the page and that's why the address field icon shown is the journal one.

    ===doWeb===

    11:24:01 test do
    11:24:01 Unable to retrieve text for XPath: //td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]
    11:24:01 Type:
    11:24:01 Scraping Keywords
    11:24:01 Keyword: children
    11:24:01 Keyword: collaborative computing
    11:24:01 Keyword: cooperative design
    11:24:01 Scraping attachments
    11:24:01 Text PDF: http://portal.acm.org/ft_gateway.cfm?id=303166&type=pdf&CFID=15756175&CFTOKEN=34810846
    11:24:01 Scraping abstract
    11:24:01 Unable to retrieve text for XPath: //td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]
    11:24:01 Type:
    11:24:01 Unable to retrieve text for XPath: //meta[@name='citation_journal_title']/@content
    11:24:01 Scraping full details from bibtex
    11:24:01 Bibtex: http://portal.acm.org/exportformats.cfm?id=303166&expformat=bibtex
    11:24:01
    ref : @inproceedings{Druin:1999:CID:302979.303166,
    author = {Druin, Allison},
    title = {Cooperative inquiry: developing new technologies for children with children},
    booktitle = {Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit},
    series = {CHI '99},
    year = {1999},
    isbn = {0-201-48559-1},
    location = {Pittsburgh, Pennsylvania, United States},
    pages = {592--599},
    numpages = {8},
    url = {http://doi.acm.org/10.1145/302979.303166},
    doi = {http://doi.acm.org/10.1145/302979.303166},
    acmid = {303166},
    publisher = {ACM},
    address = {New York, NY, USA},
    keywords = {KidPad, PETS, children, cooperative design, cooperative inquiry, design techniques, educational applications, intergenerational design team, participatory design},
    }

    11:24:01 Returned item:
    'itemType' => "journalArticle"
    'creators' ...
    '0' ...
    'firstName' => "Allison"
    'lastName' => "Druin"
    'creatorType' => "author"
    'notes' ...
    'tags' ...
    '0' => "children"
    '1' => "collaborative computing"
    '2' => "cooperative design"
    'seeAlso' ...
    'attachments' ...
    '0' ...
    'title' => "ACM Snapshot"
    'mimeType' => "text/html"
    'url' => "http://portal.acm.org/citation.cfm?doid=302979.303166"
    'document' => "[object]"
    '1' ...
    'title' => "ACM Full Text PDF"
    'mimeType' => "application/pdf"
    'url' => "http://portal.acm.org/ft_gateway.cfm?id=303166&type=pdf&CFID=15756175&CFTOKEN=34810846"
    'document' => "[object]"
    'extra' => "ACM ID: 303166"
    'title' => "Cooperative inquiry: developing new technologies for children with children"
    'publicationTitle' => "Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit"
    'series' => "CHI '99"
    'date' => "1999"
    'ISBN' => "0-201-48559-1"
    'archiveLocation' => "Pittsburgh, Pennsylvania, United States"
    'pages' => "592–599"
    'url' => ""
    'DOI' => "10.1145/302979.303166"
    'publisher' => "ACM"
    'place' => "New York, NY, USA"
    'complete' => function(...){...}
    'abstractNote' => "










    In todays homes and schools, children are emerging as frequent
    and experienced users of technology [3, 14]. As this trend
    continues, it becomes increasingly important to ask if we are
    fulfilling the technology needs of our children. To answer this
    question, I have developed a research approach that enables young
    children to have a voice throughout the technology development
    process. In this paper, the techniques of cooperative
    inquiry will be described along with a theoretical framework
    that situates this work in the HCI literature. Two examples of
    technology resulting from this approach will be presented, along
    with a brief discussion on the design-centered learning of
    team researchers using cooperative inquiry.



    "
    'libraryCatalog' => "ACM Digital Library"
    'shortTitle' => "Cooperative inquiry"

    11:24:01 Translation successful

    ==========> end of output

    So I guess this what gets run when you click the icon to ask zotero to actually import the citation. It looks like the same xPath error is present and the 'Type:' field is again empty.

    It does manage to scrape the keywords though ...and the attachment...and the abstract
    ...but seemingly not the 'Type' again (it's using the same xpath query as before).
    ...and seemingly not the citation_journal_title.

    Next, it is 'scraping full details from bibtex' (perhaps this is a fallback after the scraping of the page came up incomplete?)

    The bibtex is listed, and it does start with an @inproceedings, which as I understand it is the correct entry type for a paper in a conference proceedings.

    Then the 'returned item' is listed. Everything seems pretty right - except the type, and also the 'archiveLocation'. It's using the conference location for this, which should actually be stored in the 'place' field if my understanding is correct.

    I'll take a closer look at how it tries to determine the type, as this seems to be at the root of the problem...
  • This is odd. If I copy that BibTeX to the clipboard and import from the clipboard, it works fine.

    Not sure why the type is wrong in the import otherwise...
  • Hi ajlyon,

    Yeah it is odd. The reason is that the translator uses a separate scraper function to figure out whether it's a Proceedings or Journal paper. The XPath expression for this was *slightly* different to what's in the ACM page.

    It was:
    '//td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]'

    But the markup in the page requires:
    '//td[@nowrap="nowrap" and @style="padding-bottom:0px"]'

    This feels like a rather fragile xpath expression to use, but unfortunately the ACM pages don't have a lot of IDs on their DOM elements. A better solution might be to use the Bibtex data - but that's beyond the time that I can put in right now.

    Anyway, I've fixed the translator on my machine and it *seems* to work. The code for it is at gist.github.com, as you requested:
    <https://gist.github.com/895992>;

    Best regards,
    Jared Donovan.
  • Changes applied to Github and trunk. Also made a change to improve abstract formatting.

    I actually just removed the code that was overriding the type set by the BibTeX translator-- don't know why it was there, but couldn't find discussions in the forums that justified it.

    Thanks for looking into this.
Sign In or Register to comment.