IEEE Xplore

I'm keen to use Zotero, but most of the articles I'm interested in are in IEEE journals. Unfortunately it looks to me that the IEEE Xplore translator has been non-functional for almost a year. Although, reading the relevant forum posts, it's not clear to me whether there has just been one problem, or many in that period.

I have had a look at the translator and think I can diagnose the underlying problem. I think the translator has been written to use the feature of the ste that allows download of citations in RIS format, then uses a facility to import the RIS data. However, the site was changed a while ago so that RIS output is only available to users who are logged on.

The good news is that I looked at the format of a typical page on the site. It should be really easy to parse. With the exception of the abstract, all the required information appears to be stored in meta tags. e.g.

http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1206680

includes:

<meta name="citation_journal_title" content="Signal Processing, IEEE Transactions on">
<meta name="citation_publisher" content="IEEE">
<meta name="citation_authors" content="Jian Li">
<meta name="citation_authors" content="Stoica, P.">
<meta name="citation_authors" content="Zhisong Wang">
<meta name="citation_title" content="On robust Capon beamforming and diagonal loading">
<meta name="citation_date" content="July 2003">
<meta name="citation_volume" content="51">
<meta name="citation_issue" content="7">
<meta name="citation_firstpage" content=" 1702">
<meta name="citation_lastpage" content=" 1715">
<meta name="citation_doi" content="10.1109/TSP.2003.812831">
<meta name="citation_abstract_html_url" content="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1206680">
<meta name="citation_pdf_url" content="http://ieeexplore.ieee.org/iel5/78/27152/01206680.pdf?arnumber=1206680">
<meta name="citation_issn" content="1053-587X">
<meta name="citation_isbn" content="">
<meta name="citation_language" content="English">
<meta name="citation_keywords" content=" SOI power estimation; SOI steering vector; adaptive arrays; array signal processing; array steering vector; data-independent beamformer; diagonal loading; interference rejection; interference suppression; parameter estimation; robust Capon beamforming; signal of interest; signal resolution; signal resolution; standard Capon beamformer; uncertain steering vectors; uncertainty set;">

The abstract appears to be in the first p tag after the first h2 tag.

I'd be surprised if sites got much easier to translate than that!

I've got a bit of experience writing web scraping applications, but I don't know JavaScript (FWIW Python is my language of choice for this sort of thing). But it looks to me that someone who knew what they were doing ought to be able to write a translator very easily. Any volunteers?
«134
  • I just wrote up basic code that does all that in JavaScript, and I'll look at implementing it here.
  • BTW, that whole mess is the meta tags that Google Scholar looks for, but I believe that they are not standardized or documented properly anywhere (or at least they weren't last time I looked).

    If would be very nice if they at least documented them as an RDF vocabulary-- they wouldn't even have to change the current markup at all.
  • It probably doesn't help, but according to

    http://scholar.google.com/intl/en/scholar/inclusion.html

    this particular form of meta tags is defined by Highwire Press

    http://highwire.stanford.edu/
  • Thanks for pointing that out. I think I'll try to put all this in the Embedded RDF translator, which currently doesn't use the most widely employed vocabulary, the Highwire one.
  • Please go to http://github.com/ajlyon/zotero-bits/raw/master/IEEE Xplore.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

    It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
  • I tried it and it works. Thanks!
  • Please try with a few journals, search pages, and so forth. If nothing appears to be wrong, confirm that all is well and we'll get this out to clients.
  • ajlyon,
    I just tested your new translator and it recognized only the first author of the paper. This might be because each author actually has their own "citation_authors" tag, as in almclean's example above. I don't really know Javascript, but to me it seems the translator only parses the first tag because of
    if (newItem.creators.length == 0) { ...
  • This actually depends on the site-- some sites that use these tags put them in one tag, and some put them in multiple tags. I'll look into it.
  • Try the version on Github again now.
  • Works perfectly for me now. Thanks!
    Of course it's not optimal that different sites use this tag differently... But it seems to be consistent across IEEE Xplore at least. (I hope!)
  • This is now in the main SVN repository and should be in the next release of Zotero, hopefully 2.1.2, but definitely 2.1.3. Zotero 2.1.2 is due to ship any hour now, so no telling if it makes it in under the bar.
  • It worked for the examples I tried. Thanks, for doing that.

    The only minor annoyance was that, for some citations, the abstract extracted into Zotero contained line breaks at the end of each line. I think the linebreaks in the HTML are being preserved. This is an example

    http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=482137

    I'm guessing that removing the linebreaks would be straightforward.
  • Please go to http://github.com/ajlyon/zotero-bits/raw/master/IEEE Xplore.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

    It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
  • The latest version has broken the translator, but the fix is simple, you mis-spelled Utilities in the following line:

    if (abstractNode) newItem.abstractNote = Zotero.Utilities.trimInternal(abstractNode.textContent);

    After fixing the typo the translator worked for me.

    Thanks again.
  • Sorry about that-- the wrong version went out. Fixed now.
  • Does the current version also download pdf from IEEE Xplore? I replaced the IEEE Xplore.js in my translator directory, but it only took a snapshot of the webpage and did not download the pdf file.
  • edited April 2, 2011
    It should be saving whatever is in the citation_pdf_url field; for the Signal Processing paper linked to above, that is:
    <meta name="citation_pdf_url"
    content="http://ieeexplore.ieee.org/iel4/79/10273/00482137.pdf?arnumber=482137">


    If this URL gives Zotero a real PDF (and not a login/purchase page), it should be saving them. This may depend somewhat on how your institution accesses IEEE Xplore.
  • I do see the following citation_pdf_url field in the source file of the webpage. However, the pdf was not saved by Zotero.

    <meta name="citation_pdf_url" content="http://ieeexplore.ieee.org/iel5/8585/27206/01208965.pdf?arnumber=1208965">

    I can have the PDF shown in the browser if I give this URL directly to firefox.
  • IEEE is throwing Zotero a series of redirects as it produces the stamped PDF, and we're failing to follow them. Not sure if we can do any better.

    If Zotero learns to follow the redirects, the PDFs will save. In the meantime, you'll have to add the PDFs manually.
  • You mean HTML redirects, not HTTP redirects?
  • When I try to do debug this with curl, I get Location: headers -- so HTTP redirects. I thought Zotero did these fine, but I can't see any other reason Zotero is failing to attach the PDFs. Maybe you can check debug output for an attempt and see another reason? It's a non-matching MIME-type issue, and the redirects seemed like a likely cause.
  • XMLHttpRequest should follow redirects automatically. Zotero doesn't even see them.

    But if someone can provide a Debug ID for a save attempt we can take a look.
  • This was my mistake-- there is in fact an interstitial that presents the PDF as the content of a frame. The fixed version is now on Github:

    Please go to http://github.com/ajlyon/zotero-bits/raw/master/IEEE Xplore.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

    It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
  • The PDF issue is now fixed in the Zotero repository. It should work in the next Zotero release; you can still install from Github as described above if you don't want to wait.
  • I have tried the latest file, but still get a "could not save" error. Apart from restarting FF, is there anything else I should do after copying the file to the translators directory? Can I view debug information somewhere?

    BTW, thanks for the work that you've been doing here.
  • Please provide an example URL and a Report ID (from "Report Errors" in the gear menu).
  • http://ieeexplore.ieee.org/search/freesrchabstract.jsp?tp=&arnumber=1397735&queryText%3DApplication+of+discrete+event+systems+theory+for+modeling+and+analysis+of+a+power+transmission+network%26openedRefinements%3D*%26searchField%3DSearch+All

    Report Id: 507205594
  • Should be fixed now. But you may have to have IEEE Xplore access to use it (not a guest)-- they don't expose all the metadata to guest users.

    Please go to http://github.com/ajlyon/zotero-bits/raw/master/IEEE Xplore.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

    It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
  • Hi, thanks for the help. I still get the same problem. I do have access via a portal from my university. The same link via the portal is : http://0-ieeexplore.ieee.org.innopac.up.ac.za/search/srchabstract.jsp?tp=&arnumber=1397735&queryText%3DApplication+of+discrete+event+systems+theory+for+modeling+and+analysis+of+a+power+transmission+network%26openedRefinements%3D*%26searchField%3DSearch+All

    I doubt that this will be of any help to you though, you'll just be prompted for a login. Is there any other information I could provide?

    I logged a new error report, but the error is the same : 1234670557
Sign In or Register to comment.