SAE Translator

Hello all,

I've written a simple translator for SAE Technical Papers, example: http://www.sae.org/technical/papers/2004-28-0079

I'm treating these papers, as reports, but they are really not. I'm not sure what the best way to describe them is, to be honest. The most important number for citing these papers are the document numbers.

Anyway, here is the code:

REPLACE INTO translators VALUES ('2d05ee76-2ea0-49e7-90c9-41889e523159', '1.0.0b3r1', '', '2007-10-14 23:57:14', '0', '100', '4', 'SAE Technical Papers', 'Forest Gregg', 'http://www\.sae\.org/technical/papers/',
'function detectWeb(doc, url) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == ''x'') return namespace; else return null;
} : null;

if(url.match(/^http:\/\/www\.sae\.org\/technical\/papers\/.+/)){
return "report";
}
return false;
}
',
'function scrape(doc) {
var namespace = doc.documentElement.namespaceURI;
var nsResolver = namespace ? function(prefix) {
if (prefix == "x") return namespace; else return null;
} : null;

//Determine document type and assign determine type-appropriate value
switch(doc.evaluate(''//meta[@name="doc_type"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent){
case "Technical Paper" :
var newItem = new Zotero.Item("report");
newItem.publicationTitle = "SAE Technical Papers";
newItem.number=doc.evaluate(''//meta[@name="product_code"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
break;

default :
newItem.ISBN=doc.evaluate(''//meta[@name="isbn"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
newItem.ISSN=doc.evaluate(''//meta[@name="issn"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
}

//Define common document values
newItem.title=doc.title;
newItem.url=doc.evaluate(''//meta[@name="identifier_url"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
newItem.date=doc.evaluate(''//meta[@name="publ_date"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
newItem.pages=doc.evaluate(''//meta[@name="num_pages"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
newItem.publisher=doc.evaluate(''//meta[@name="publisher"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
newItem.abstractNote=doc.evaluate(''//p[strong="Abstract: "]/text()'', doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent.replace(/^\s+|\s+$/g, '''');

//Get Authors
itemAuthors=doc.evaluate(''//meta[@name="author"]/@content'', doc, nsResolver, XPathResult.ANY_TYPE, null);

while (author = itemAuthors.iterateNext()){
var names = author.textContent.replace(/^\s+|\s+$/g, '''').split(" ");
var fname = names.shift();
var lname = names.pop();
newItem.creators.push({firstName:fname, lastName:lname, creatorType:"author"});
}

newItem.complete();

}

function doWeb(doc, url) {
if(url.match(/^http:\/\/www\.sae\.org\/technical\/papers\/.+/)){
scrape(doc);
}
}
');

This is an old discussion that has not been active in a long time. Instead of commenting here, you should start a new discussion. If you think the content of this discussion is still relevant, you can link to it from your new discussion.