Abstract not scraped from SpringerLink

Andrew Moylan · April 1, 2007

http://www.springerlink.com/content/u927215068044412/

The abstract was not scraped from the above article.

sean · April 2, 2007

Our SpringerLink translator relies on the site's RIS files, which in this case unfortunately do not include the abstract.

blerner · September 23, 2007

http://dx.doi.org/10.1007/978-3-540-74061-2_10

With a freshly-updated copy of Zotero, there are a few problems with the springerlink scraper for the above article:
* The authors' first names are not scraped: I get "Nandivada, (first)" in Zotero (along with the other two authors), and "Nandivada and Pereira and Palsberg" in BibTeX export, which leads me to suspect the names aren't being scraped
* The scraped item is a Web Page in Zotero, and a @misc in BibTeX, when it should be a conference paper -- the RIS "TY" field says Chapter, and it's being ignored
* The URL in the RIS file is a DOI entry (recognizable because it has dx.doi.org in it), and should be filed as such.
* (Possibly unrelated) The BibTeX export of the scraped entry doesn't include all the fields stored by Zotero -- the URL field, for instance, should be maintained since some citation formats include the URL in the citation.

I don't know much about Zotero's APIs, but I could try writing a new version of the scraper to fix these...

noksagt · September 23, 2007

Re. the resource type:
http://forums.zotero.org/discussion/969#Item_10

DOI has a ticket open at:
https://www.zotero.org/trac/ticket/684
(but DOI support for other types might need to be added)

blerner · September 23, 2007

I'd seen that, and in fact commented in that thread too, but when I saw the thread was closed without the issue being resolved, I thought I'd check again. Thanks for the update.

dstillman · September 23, 2007

I don't know much about Zotero's APIs, but I could try writing a new version of the scraper to fix these...

Any help you'd be able to provide would be great. See Creating Translators for Sites for more info and a link to Scaffold, our mini translator development IDE. It'll let you pull up the SpringerLink and/or RIS translators, make changes, and try them out.

blerner · September 24, 2007

Fixing the first two of my bullet points is easy. Here's the code changes I made (if there's a bugtraq or trac ticket I should use instead, please point me to it...):

To fix authors not appearing correctly, I changed the body of the following loop in the SpringerLink scraper:

// fix incorrect authors
var oldCreators = item.creators;
item.creators = new Array();
for each(var creator in oldCreators) {
	item.creators.push(Zotero.Utilities.cleanAuthor(creator.firstName + " " + creator.lastName, creator.creatorType));
}

I'm not quite sure why the incorrect authors fixes are needed at all -- if I delete this loop altogether, it works fine for me... In any case, changing "author" to creator.creatorType is probably right; including the creator.firstName is what was snarling the import, since it's correct in the RIS data. Are there counterexamples where this change now breaks formerly working pages?

To fix the bookChapter problem, I changed the condition in the following if-test in the RIS scraper:

// first check typeMap
for(var i in typeMap) {
	Zotero.debug(i);			
	if(value.substring(0,typeMap[i].length) == typeMap[i]) {
		item.itemType = i;
	}
}

The problem is typeMap["bookChapter"] == "CHAP", and the RIS contains "CHAPTER"; this test allows for partial matches.

hope this helps...

vesal · October 16, 2007

What is the current status on this? Has anyone contacted the IT staff at Springer? (as this seems to be primarily their fault)

However, if bierner's fixes work, then please do incorporate a work-around until Springer does something. They are a major resource, and isn't the strength of Zotero that these scrapers can be adapted quickly?