ACM Conference papers imported as Journal papers

Jared Donovan · March 30, 2011

Hi,

I've noticed since upgrading to 2.1 that some of the articles I import from the ACM digital library get imported as journal papers, when they are actually conference papers.

e.g. <http://dx.doi.org/10.1145/302979.303166>;

I'm using Zotero 2.1.1 and Firefox 4.0 on OSX.

I'm reasonably technically proficient, so I could probably fix this if:
a) it's actually a problem (i.e. the intended behavior is not to import this as a journal article)
b) someone can give me a clue for where to start digging ('translators' folder in zotero? - but how to debug?)
c) presuming there is a fix, where is the place for me to contribute it back?

Best Regards,
Jared Donovan.

Jared Donovan · March 30, 2011

BTW - I saw in some other threads that there has been some problems with the DOI translator and some ACM imports - but I checked which translator was being used in my case by hovering on the address field icon and could see that it was 'ACM Digtal Library'.

adamsmith · March 30, 2011

a) yes it's actually a problem - we want data as precise as possible. That said, you'd have to check if the data is actually there - i.e. if data that ACM passes on to Zotero contains that information. If not it's game over.

b) Translator documentation isn't great, what we've got is here:
http://www.zotero.org/support/dev/creating_translators_for_sites
There's an ongoing push for better dev docs.
Translators are .js files in the translator folder in the Zotero folder.

c) pst the fixed ACM.js translator to gist.github.com and either post here or post to the zotero-dev group: http://groups.google.com/group/zotero-dev

Jared Donovan · March 30, 2011

Thanks for the quick reply. I've started by taking a look at the translator through the scaffold addon. I'll just dump some notes on my progress here. Please don't take this as an obligation to reply.

I've run the detectWeb and doWeb commands. Output is:

===detectWeb===

11:24:55 Title: Cooperative inquiry: developing new technologies for children with children
11:24:55 Single item detected
11:24:55 Unable to retrieve text for XPath: //td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]
11:24:55 Type:
11:24:55 detectWeb returned type "journalArticle"

...so there's some XPath problem, and the type is already being determined as 'journalArticle' here. I guess this is the code that gets run by zotero to see if there's anything to import on the page and that's why the address field icon shown is the journal one.

===doWeb===

11:24:01 test do
11:24:01 Unable to retrieve text for XPath: //td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]
11:24:01 Type:
11:24:01 Scraping Keywords
11:24:01 Keyword: children
11:24:01 Keyword: collaborative computing
11:24:01 Keyword: cooperative design
11:24:01 Scraping attachments
11:24:01 Text PDF: http://portal.acm.org/ft_gateway.cfm?id=303166&type=pdf&CFID=15756175&CFTOKEN=34810846
11:24:01 Scraping abstract
11:24:01 Unable to retrieve text for XPath: //td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]
11:24:01 Type:
11:24:01 Unable to retrieve text for XPath: //meta[@name='citation_journal_title']/@content
11:24:01 Scraping full details from bibtex
11:24:01 Bibtex: http://portal.acm.org/exportformats.cfm?id=303166&expformat=bibtex
11:24:01
ref : @inproceedings{Druin:1999:CID:302979.303166,
author = {Druin, Allison},
title = {Cooperative inquiry: developing new technologies for children with children},
booktitle = {Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit},
series = {CHI '99},
year = {1999},
isbn = {0-201-48559-1},
location = {Pittsburgh, Pennsylvania, United States},
pages = {592--599},
numpages = {8},
url = {http://doi.acm.org/10.1145/302979.303166},
doi = {http://doi.acm.org/10.1145/302979.303166},
acmid = {303166},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {KidPad, PETS, children, cooperative design, cooperative inquiry, design techniques, educational applications, intergenerational design team, participatory design},
}

11:24:01 Returned item:
'itemType' => "journalArticle"
'creators' ...
'0' ...
'firstName' => "Allison"
'lastName' => "Druin"
'creatorType' => "author"
'notes' ...
'tags' ...
'0' => "children"
'1' => "collaborative computing"
'2' => "cooperative design"
'seeAlso' ...
'attachments' ...
'0' ...
'title' => "ACM Snapshot"
'mimeType' => "text/html"
'url' => "http://portal.acm.org/citation.cfm?doid=302979.303166"
'document' => "[object]"
'1' ...
'title' => "ACM Full Text PDF"
'mimeType' => "application/pdf"
'url' => "http://portal.acm.org/ft_gateway.cfm?id=303166&type=pdf&CFID=15756175&CFTOKEN=34810846"
'document' => "[object]"
'extra' => "ACM ID: 303166"
'title' => "Cooperative inquiry: developing new technologies for children with children"
'publicationTitle' => "Proceedings of the SIGCHI conference on Human factors in computing systems: the CHI is the limit"
'series' => "CHI '99"
'date' => "1999"
'ISBN' => "0-201-48559-1"
'archiveLocation' => "Pittsburgh, Pennsylvania, United States"
'pages' => "592–599"
'url' => ""
'DOI' => "10.1145/302979.303166"
'publisher' => "ACM"
'place' => "New York, NY, USA"
'complete' => function(...){...}
'abstractNote' => "

In todays homes and schools, children are emerging as frequent
and experienced users of technology [3, 14]. As this trend
continues, it becomes increasingly important to ask if we are
fulfilling the technology needs of our children. To answer this
question, I have developed a research approach that enables young
children to have a voice throughout the technology development
process. In this paper, the techniques of cooperative
inquiry will be described along with a theoretical framework
that situates this work in the HCI literature. Two examples of
technology resulting from this approach will be presented, along
with a brief discussion on the design-centered learning of
team researchers using cooperative inquiry.

"
'libraryCatalog' => "ACM Digital Library"
'shortTitle' => "Cooperative inquiry"

11:24:01 Translation successful

==========> end of output

So I guess this what gets run when you click the icon to ask zotero to actually import the citation. It looks like the same xPath error is present and the 'Type:' field is again empty.

It does manage to scrape the keywords though ...and the attachment...and the abstract
...but seemingly not the 'Type' again (it's using the same xpath query as before).
...and seemingly not the citation_journal_title.

Next, it is 'scraping full details from bibtex' (perhaps this is a fallback after the scraping of the page came up incomplete?)

The bibtex is listed, and it does start with an @inproceedings, which as I understand it is the correct entry type for a paper in a conference proceedings.

Then the 'returned item' is listed. Everything seems pretty right - except the type, and also the 'archiveLocation'. It's using the conference location for this, which should actually be stored in the 'place' field if my understanding is correct.

I'll take a closer look at how it tries to determine the type, as this seems to be at the root of the problem...

ajlyon · March 31, 2011

This is odd. If I copy that BibTeX to the clipboard and import from the clipboard, it works fine.

Not sure why the type is wrong in the import otherwise...

Jared Donovan · March 31, 2011

Hi ajlyon,

Yeah it is odd. The reason is that the translator uses a separate scraper function to figure out whether it's a Proceedings or Journal paper. The XPath expression for this was *slightly* different to what's in the ACM page.

It was:
'//td[@nowrap="nowrap" and @style="padding-bottom: 0px;"]'

But the markup in the page requires:
'//td[@nowrap="nowrap" and @style="padding-bottom:0px"]'

This feels like a rather fragile xpath expression to use, but unfortunately the ACM pages don't have a lot of IDs on their DOM elements. A better solution might be to use the Bibtex data - but that's beyond the time that I can put in right now.

Anyway, I've fixed the translator on my machine and it *seems* to work. The code for it is at gist.github.com, as you requested:
<https://gist.github.com/895992>;

Best regards,
Jared Donovan.

ajlyon · March 31, 2011

Changes applied to Github and trunk. Also made a change to improve abstract formatting.

I actually just removed the code that was overriding the type set by the BibTeX translator-- don't know why it was there, but couldn't find discussions in the forums that justified it.

Thanks for looking into this.