IEEE Xplore
I'm keen to use Zotero, but most of the articles I'm interested in are in IEEE journals. Unfortunately it looks to me that the IEEE Xplore translator has been non-functional for almost a year. Although, reading the relevant forum posts, it's not clear to me whether there has just been one problem, or many in that period.
I have had a look at the translator and think I can diagnose the underlying problem. I think the translator has been written to use the feature of the ste that allows download of citations in RIS format, then uses a facility to import the RIS data. However, the site was changed a while ago so that RIS output is only available to users who are logged on.
The good news is that I looked at the format of a typical page on the site. It should be really easy to parse. With the exception of the abstract, all the required information appears to be stored in meta tags. e.g.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1206680
includes:
<meta name="citation_journal_title" content="Signal Processing, IEEE Transactions on">
<meta name="citation_publisher" content="IEEE">
<meta name="citation_authors" content="Jian Li">
<meta name="citation_authors" content="Stoica, P.">
<meta name="citation_authors" content="Zhisong Wang">
<meta name="citation_title" content="On robust Capon beamforming and diagonal loading">
<meta name="citation_date" content="July 2003">
<meta name="citation_volume" content="51">
<meta name="citation_issue" content="7">
<meta name="citation_firstpage" content=" 1702">
<meta name="citation_lastpage" content=" 1715">
<meta name="citation_doi" content="10.1109/TSP.2003.812831">
<meta name="citation_abstract_html_url" content="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1206680">
<meta name="citation_pdf_url" content="http://ieeexplore.ieee.org/iel5/78/27152/01206680.pdf?arnumber=1206680">
<meta name="citation_issn" content="1053-587X">
<meta name="citation_isbn" content="">
<meta name="citation_language" content="English">
<meta name="citation_keywords" content=" SOI power estimation; SOI steering vector; adaptive arrays; array signal processing; array steering vector; data-independent beamformer; diagonal loading; interference rejection; interference suppression; parameter estimation; robust Capon beamforming; signal of interest; signal resolution; signal resolution; standard Capon beamformer; uncertain steering vectors; uncertainty set;">
The abstract appears to be in the first p tag after the first h2 tag.
I'd be surprised if sites got much easier to translate than that!
I've got a bit of experience writing web scraping applications, but I don't know JavaScript (FWIW Python is my language of choice for this sort of thing). But it looks to me that someone who knew what they were doing ought to be able to write a translator very easily. Any volunteers?
I have had a look at the translator and think I can diagnose the underlying problem. I think the translator has been written to use the feature of the ste that allows download of citations in RIS format, then uses a facility to import the RIS data. However, the site was changed a while ago so that RIS output is only available to users who are logged on.
The good news is that I looked at the format of a typical page on the site. It should be really easy to parse. With the exception of the abstract, all the required information appears to be stored in meta tags. e.g.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1206680
includes:
<meta name="citation_journal_title" content="Signal Processing, IEEE Transactions on">
<meta name="citation_publisher" content="IEEE">
<meta name="citation_authors" content="Jian Li">
<meta name="citation_authors" content="Stoica, P.">
<meta name="citation_authors" content="Zhisong Wang">
<meta name="citation_title" content="On robust Capon beamforming and diagonal loading">
<meta name="citation_date" content="July 2003">
<meta name="citation_volume" content="51">
<meta name="citation_issue" content="7">
<meta name="citation_firstpage" content=" 1702">
<meta name="citation_lastpage" content=" 1715">
<meta name="citation_doi" content="10.1109/TSP.2003.812831">
<meta name="citation_abstract_html_url" content="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1206680">
<meta name="citation_pdf_url" content="http://ieeexplore.ieee.org/iel5/78/27152/01206680.pdf?arnumber=1206680">
<meta name="citation_issn" content="1053-587X">
<meta name="citation_isbn" content="">
<meta name="citation_language" content="English">
<meta name="citation_keywords" content=" SOI power estimation; SOI steering vector; adaptive arrays; array signal processing; array steering vector; data-independent beamformer; diagonal loading; interference rejection; interference suppression; parameter estimation; robust Capon beamforming; signal of interest; signal resolution; signal resolution; standard Capon beamformer; uncertain steering vectors; uncertainty set;">
The abstract appears to be in the first p tag after the first h2 tag.
I'd be surprised if sites got much easier to translate than that!
I've got a bit of experience writing web scraping applications, but I don't know JavaScript (FWIW Python is my language of choice for this sort of thing). But it looks to me that someone who knew what they were doing ought to be able to write a translator very easily. Any volunteers?
If would be very nice if they at least documented them as an RDF vocabulary-- they wouldn't even have to change the current markup at all.
http://scholar.google.com/intl/en/scholar/inclusion.html
this particular form of meta tags is defined by Highwire Press
http://highwire.stanford.edu/
It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
I just tested your new translator and it recognized only the first author of the paper. This might be because each author actually has their own "citation_authors" tag, as in almclean's example above. I don't really know Javascript, but to me it seems the translator only parses the first tag because of
if (newItem.creators.length == 0) { ...
Of course it's not optimal that different sites use this tag differently... But it seems to be consistent across IEEE Xplore at least. (I hope!)
The only minor annoyance was that, for some citations, the abstract extracted into Zotero contained line breaks at the end of each line. I think the linebreaks in the HTML are being preserved. This is an example
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=482137
I'm guessing that removing the linebreaks would be straightforward.
It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
if (abstractNode) newItem.abstractNote = Zotero.Utilities.trimInternal(abstractNode.textContent);
After fixing the typo the translator worked for me.
Thanks again.
citation_pdf_url
field; for the Signal Processing paper linked to above, that is:<meta name="citation_pdf_url"
content="http://ieeexplore.ieee.org/iel4/79/10273/00482137.pdf?arnumber=482137">
If this URL gives Zotero a real PDF (and not a login/purchase page), it should be saving them. This may depend somewhat on how your institution accesses IEEE Xplore.
<meta name="citation_pdf_url" content="http://ieeexplore.ieee.org/iel5/8585/27206/01208965.pdf?arnumber=1208965">
I can have the PDF shown in the browser if I give this URL directly to firefox.
If Zotero learns to follow the redirects, the PDFs will save. In the meantime, you'll have to add the PDFs manually.
But if someone can provide a Debug ID for a save attempt we can take a look.
Please go to http://github.com/ajlyon/zotero-bits/raw/master/IEEE Xplore.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).
It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
BTW, thanks for the work that you've been doing here.
Report Id: 507205594
Please go to http://github.com/ajlyon/zotero-bits/raw/master/IEEE Xplore.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).
It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
I doubt that this will be of any help to you though, you'll just be prompted for a login. Is there any other information I could provide?
I logged a new error report, but the error is the same : 1234670557