Suggested improvement to COinS importer

patrickmineault · January 25, 2013

Within a COinS span, there may be an anchor (<a>) that links to the URL of the article referenced. See for example the reference at the end of this blog post:

http://xcorr.net/2013/01/24/categories/

It's straightforward to:

read the hrefs of these anchors
add them in the URL field of the added item and
add snapshots of the URLs in question to the added item

Currently the only snapshot which is attached is the one from the referring URL; more useful is the snapshot of the URL of the article. I think this is a useful improvement.

Sample implementation (only attaches last URL within the span):

in doWeb:


var theas = doc.evaluate('.//a[@href]', span, nsResolver, XPathResult.ANY_TYPE, null);
while(thea = theas.iterateNext())
{
	newItem.URL = thea.href;
}

in completeItems, in the else clause:


if(newItems[i].URL)
{
	newItems[i].attachments.push({title:'[Original article] ' + newItems[i].title,url:newItems[i].URL,mimeType:'text/html'});
}

noksagt · January 26, 2013

Most COinS do not feature a URL inside the span. LibX & others will replace the entire span. As such, the span is usually empty (or effectively empty, with only a non-breaking space) OR has elements that will appear when no COinS tool processes the page (and I'd imagine that some will therefore have a link to a COinS processor).

http://ocoins.info/#id3205609416

patrickmineault · January 26, 2013

Well, how about fetching the linked webpage when there's a DOI in the COinS info? That avoids the issue of finding out whether an <a> tag is relevant or not; the URL associated with the DOI is always relevant.

adamsmith · January 26, 2013

I definitely agree with the spirit of patrick's suggestion - having the origin page as a snapshot is a bit silly for increasingly common COinS bibliographies.
DOI sounds like a good idea to me - any objections? (one possible problem could be that the COinS - assuming it is for the actual article the user is looking at - could be from a non-gated version of an article and the DOI could resolve to a gated one, not sure how likely that is).