Metadata retrieval mismatch (SIGN guidelines)

bertieb · November 21, 2011

Retrieving metadata from SIGN guideline 113 (available from http://www.sign.ac.uk/guidelines/fulltext/113/index.html, direct link http://www.sign.ac.uk/pdf/sign113.pdf) returns:

"A new system for grading recommendations in evidence based guidelines", an article from the BMJ from 2001. Similarly, a test of SIGN 96 returns "Guideline development process for the Health for Kids in the South East project". Testing the other SIGN guielines I have saved handy demonstrates similar failures.

Many SIGN guidelines have a few pages of cover, title and a page explaining evidence grading before any identifier. For example SIGN 113 has the ISBN on page 4, in the format "978 1 905813 54 4". I'm not sure if that's how the metadata retrieval system works, but reading a few other threads about it suggest that it looks at the first couple of pages of the PDF for identifiers. Is that broadly correct?

(As it stands, it is possible to go to the WorldCat site and look up the ISBN so there are ways around it. WorldCat finds it fine without spaces (http://www.worldcat.org/title/diagnosis-and-pharmacological-management-of-parkinsons-disease-a-national-clinical-guideline/oclc/614591030&referer=brief_results), but not with spaces.)

Cheers
bertieb

PS: Forgive a silly question, but... I know nothing about writing translators, but if I was interested in writing one for the SIGN Guidelines website, is it possible to look up WordCat entries and get metadata from them based on a detected ISBN?

adamsmith · November 21, 2011

I don't think the pdf retrieval actually looks for ISBNs - it just looks for DOIs. Otherwise it composes a search string from text on the first couple of pages and queries google scholar, importing the first result - I haven't checked this, but I figure that is the problem here.

As for translators - yes, I think you could write a translator that gets the ISBN for a page and then uses WorldCat to query that - but it's a bit more advanced as translators go. Unfortunately, the site is otherwise so unstructured that I don't see any reasonable alternative.

bertieb · November 21, 2011

Cheers for the info! I guess it's easier to look for a DOI than an ISBN? Regardless, the retrieval system seems to work well for most things which don't have generic bumph at the start.

Advanced you say? Well, where there's a will, there's a way! Sadly, the SIGN website is suprisingly unstructured for the level of organisation they otherwise have and the work they do. They may be amenable to change though.

adamsmith · November 21, 2011

You could send them this, maybe they'd consider embedding some data
http://www.zotero.org/support/dev/exposing_metadata

As for the translator - if you get to this, check the Institute of Physics translator which does the same for DOIs.
(And yes, if you look through a document, DOIs are easier to identify because they all start with "10.")

ajlyon · November 21, 2011

I did augment the DOI translator to add ISBN support at one point, but I was getting a lot of false positives (even with the check digit, there are a lot of valid ISBN-10s out there just floating around!), so I gave up on that project.

bertieb · November 21, 2011

Aha thanks, that's a rather handy guide. I'll include that when I get in touch with them.

I've had a look at the Insititute of Physics translator; it seems they call CrossRef which looks like it can handle ISBNs, which will hopefully make things a bit more straightforward! Now that I have my head around XPath (somewhat) I'll see if I can do something similar.

On the other hand, their site is so surprisingly heterogeneous anything I hack up will be rather fragile, so getting them to do the work on their end might be the way to go.

bertieb · November 21, 2011

Scratch that, CrossRef obviously only does DOIs! My eyes and brain must be tired. I'm guessing from ajlyon's comment (which now makes more sense) that there currently isn't a search translator that handles ISBNs? If not, how does Zotero's "add item by identifier" magic button work? I'd dive through the code and look myself but a quick search for ISBN in the source code reveals little (and I'd likely misread anyway!

adamsmith · November 21, 2011

it uses the open worldcat translator.

ajlyon · November 21, 2011

Yes, I augmented DOI.js to use the ISBN search in the Open WorldCat translator. If you'd like to try it out, go to http://github.com/ajlyon/zotero-bits/raw/master/DOI_and_ISBN.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data). Restart Firefox and see how it goes.