Translator for meetinglibrary.asco.org
I would really love a site translator for the American Society of Clinical Oncology (ASCO) abstracts repository (meetinglibrary.asco.org) I am willing to try my hand at writing one (I successfully installed Scaffold) but I have no XML/JAVA coding experience so I would need a lot of hand-holding. The issue that I could foresee is that the abstracts seem to be formatted as plain text (HTML?) and don't seem to have any metadata associated with them. Furthermore, it doesn't appear that JCO (the journal that publishes ASCO abstracts) doesn't appear to assign unique DOIs for each abstract. I'm not sure what the implications of this is for scaffold. Bottom line: I'd love a direct import feature that will recognize content like this : http://meetinglibrary.asco.org/content/97508-114
as a journal article and correctly assign the citation information for a direct Zotero import. Can anyone with more Scaffold/translator experience help me out, or would someone be willing to help create one? If I can learn to do this one, there are other sites like this one (repositories for clinical meeting abstracts) that I'd be willing to help develop translators for.
Cheers!
as a journal article and correctly assign the citation information for a direct Zotero import. Can anyone with more Scaffold/translator experience help me out, or would someone be willing to help create one? If I can learn to do this one, there are other sites like this one (repositories for clinical meeting abstracts) that I'd be willing to help develop translators for.
Cheers!
I good place to start would be here
http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus
this isn't perfect, but should give you a good idea about how to work with xpaths. I'm happy to answer specific questions and do some hand-holding on the way. If this gets more technical, Dan may eventually ask us to move this over to the development listserv. If you want to quote larger snippets of code for questions, use a public gist at gist.github.com or pastebin.com
https://github.com/zotero/translators/blob/master/SlideShare.js
and/or
https://github.com/zotero/translators/blob/master/Columbia University Press.js
http://dl.dropboxusercontent.com/u/848981/it/xp/xp.html
IIRC Firebug does give you xpaths, but they're not very useful.
You'll have to manually adjust all automatically generated xpaths to be reliable and usable across sites.
the code looks like this
var AuthorXpath = '//div[contains(@class, "field") and contains(@class, "field-name-field-authors") and contains(@class, "field-type-text-long") and contains(@class, "field-label-above")]/div[@class="field-items"]/div[contains(@class, "field-item") and contains(@class, "even")]/p'
var Authors = doc.evaluate(AuthorXpath, doc, nsResolver, XPathResult.ANY_TYPE, null).iterateNext().textContent;
var items = new Array();
var headers;
while (headers = Authors.iterateNext()) {
items.push(headers.textContent);
}
Zotero.debug(items);
}
any thoughts? As far as I can tell the syntax is the same as what I used for the title (i.e. [defined variable name].iterateNext()] so why am I getting this error?
Thanks so much for your help so far. The guide has been great.
you'll likely want to delete .iterateNext().textContent;
from the line starting with var Authors =
ZU.xpath and ZU.xpathText
they don't work exactly the same way as doc.evaluate does, but save you a lot of messy code, see the translators above for examples.
The main difference is that you can't use ZU.xpath in a while loop, you have to use a for loop
1) I am not able to use ZU.xpath. All of my attempts to use it have returned errors
2) I can't seem to figure out how to link together multiple pieces of bibliographic info into a working translator. I've looked at the examples above and they seem awfully different from the example ones in HWZT. Since these are for books and presentations, are there any relevant translators that are for journal articles? If I could find a translator that works and just adapt it by pointing it at the right URL and changing the XPaths for each piece of bibliographic information...well that might be cheating but it seems more in the realm of something i might be able to accomplish.
It should be sufficient for what you need and will make things a lot simpler
Edit: also, if you're not already, you should use Scaffold http://www.zotero.org/support/dev/translators/scaffold
This is a huge step forward. I now have a working translator that just needs a little tweaking. Thanks everyone for your support. I will post again when I have it up and running completely.
@adamsmith
would the Zotero project be interested in using this translator? A search of the forums seemed to indicate that only 1 other person was interested in using a translator for ASCO abstracts, and that was many years ago- still, if anyone could benefit from its use...
How would I go about submitting this for general use?
https://github.com/citation-style-language/styles/blob/master/CONTRIBUTING.md
substituting
https://github.com/zotero/translators
for https://github.com/citation-style-language/
if you can't make it work, putting it up as a gist will do as well, but the pull request is much preferable
Right now I have my scraper set up like this:
creators : FW.Xpath ('//div[@class = "author-list"]/p').text().remove(/"^;", -g/).split(/\,/).replace(/\s/," ").cleanAuthor("author"),
the .remove() was supposed to get rid of everything after the first instance of ";" and then run the cleanauthor function. instead, I don't see that it actually removes anything, although it might be removing a single instance of ";" and I just can't find it. How do I write a regex that says "everything after the first instance of ";"?
Thanks again for all your help
seems to work
I think what you want is probably
.remove(/;[\s\S]*/)
This will remove all characters (including newlines) after (and including) the ;
.replace(/\s/," ")
will not really do anything (not what you expect anyway).You probably want
.replace(/[\s\r\n]+/g, " ")
Edit: great place to learn regex http://www.regular-expressions.info/
Thanks. I originally put the gm flags in there because I was sort of trying a shotgun approach. I thought perhaps the reason I was unable to remove any characters at first was because the match was stopping at the first ";"- so I put in the global flag. then I thought maybe it was a multi-line issue, so i put in the m flag. when I finally discovered the problem was a combination of syntax issues with /\;*/ (I needed to backslash out the ";" and add "*" to continue matching) I decided "if it ain't broke, don't fix it" and stopped messing with the code. In the interest of increasing my understanding, I'll play around with some of the expressions you recommended. In the end, I was able to scrape all of the bibliographic info I needed for my purposes.
When I have a spare minute, I'll clean it up a bit and add a few more fields and put it on Github. Thanks again for your help.
Instead, I thought it would be good to use a multi scraper and set it up to read the section headers directly to input the info. I will try again to work on this when I have some spare time. In the meantime, I could post the translator I have just as an FYI to anyone looking for this.
I'd also like to try developing one for clinicaltrials.gov, which is a website that collects data on clinical trials in the US. These entries wouldn't be journals, books, or anything that seems to fit neatly into an existing category, so I don't really know where to start when it comes to creating a translator for something like that.
Anyway, thanks for your help. I really learned a lot and in the future if I need a site translator, I can try to jump right in.
As for clinical trials - you're right, there's no great category, I think I'd suggest going with report and maybe switching it to dataset (which in structure is quite similar) once we introduce that item type, likely in Zotero 4.2
https://github.com/zotero/translators/pull/601
I came across this thread while searching for a way to get ClinicalTrials.gov data into Zotero, other than by creating a web page entry and then editing manually. You'd mentioned wanting to work on a ClinicalTrials.gov filter - is that still on your wishlist? I'd aid and abet, although I'd be starting from the bottom of the learning curve.