Can't import from Physical Review

bjoerns · January 4, 2010

Hi everyone!
I just wanted to add a paper from Physical Review B and one from Physical Review Letters to my zotero library when I noted that at the moment, the usual zotero buttom in the address line of firefox is missing (not only for these two papers but for all APS journal papers), so I cannot add them properly now :-(

http://prl.aps.org/abstract/PRL/v103/i25/e257202
http://prb.aps.org/abstract/PRB/v80/i23/e235321

The problem occurs not only on my computer but also on others, so it seems as if something has changed with APS during christmas...

bjoerns · January 5, 2010

After removing my zotero collection, the symbol to add items reappears on the APS page, however, when I try to add a paper to my collection I get an error message.

The error ID is: 1453710441

dstillman · January 5, 2010

Not sure what you mean by "removing my zotero collection"—that certainly doesn't sound like anything you should have to do—but the two URLs above work for me via the DOI translator. (Hover over the address bar icon to see the translator used.) I'm not sure if they used a different translator before, but the DOI translator should work.

Is it still not working for you at the above URLs, or are you trying different URLs?

bjoerns · January 5, 2010

"removing my zotero collection" means I deleted the zotero folder in my firefox profile (keeping a backup of course). After having deleted the collection, the icon is back again and uses the DOI translator, however adding does not work and I get the error message :

[Javascript Error: "DOI translator: could not find DOI" {file:"chrome://zotero/content/xpcom/translate.js" line 896}]

dstillman · January 5, 2010

Well, for starters, restore your Zotero directory, since there's definitely no need to remove that, and any problem you're having should be debugged. You can try "Reset Translators and Styles" in the Advanced pane of the Zotero preferences, which shouldn't be necessary in general but, if you're having a problem, would probably have the same effect as removing the entire data directory.

After running that and, for good measure, clicking "Update Now" in the General pane of the Zotero preferences, please confirm that you're still getting the error on the above URLs. If you are, you may be experiencing an issue with Zotero 1.0 that doesn't exist in 2.0.

Rintze · January 5, 2010

I'm not sure if they used a different translator before

I think the PROLA translator has been written for these sites, but its target-regex doesn't work anymore. I'd submit a fix, but there also seems to be a problem with asynchronous calls, and I can't test the new synchronous functions in Scaffold.

bjoerns · January 5, 2010

I restored my collection, reset translators and styles and clicked "update now". The only effect this has, is that now the icon in the address bar reappeared even with my collection not being deleted. However, adding still does not work and gives the same error as above.

hagver · January 5, 2010

Similar problem:
I can import papers from the Physical Review, but the abstract field remains empty.
(Just tested for Physical Review Letters and Physical Review A).

hagver · January 5, 2010

... and the PDFs are missing, too.

bjoerns · January 6, 2010

I tested zotero 2 on another computer (I don't want to switch completely as I am about to finish my PhD thesis) and in this case I can add the papers properly.

schmid · January 11, 2010

It seems that it has been changed to the DOI (Crossref) translator. There are two problems with it:
(1) Bug: Article numbers are not stored in the "Pages" field.
(2) Abstracts are not captured (I don't have automatic PDF capture on, so I don't know about PDFs)

Sample link:
http://prl.aps.org/abstract/PRL/v99/i12/e126105

roberto.derenzi · January 15, 2010

I have a similar problem.
I run a check of the database (was corrupted but now it is fine) and checked all suggestions in "Troubleshooting translator". My symptoms were and are:

I click on the zotero image in URL field (any PRB, PRL paper)
I get "Could not save item. An error occurred when saving the item, ..."
No more info

I have Firefos 3.0.17
Zotero 1.0.10 updated

I'd be very grateful for any help

bjoerns · January 18, 2010

After having migrated to Zotero 2 completely, I can confirm the bugs mentioned by schmid: No abstract and no article numbers are stored (the latter one is really annyoing).

npj · January 18, 2010

I just tried now. It uses DOI finder. No article numbers, no pdf. I'm going through a proxy which might hypothetically explain the last one, but not the first.

npj · January 18, 2010

The problem is partly fixed. The target regex is better if set to:

"https?://(?:www\\.)?(prola|prl|prb|rmp|pra|prc|prd|pre|prst-ab|prst-per|).aps.org.*/(toc|searchabstract|abstract)/"

This means that you can also use PROLA on prl.aps.org etc., not just prola.aps.org
I also added the .* at aps.org.* , so the site will register through proxies of the form
prola.aps.org.myproxy.example.com/etc/etc

This may not be a good idea, since for some reason this doesn't work through proxies. I'll paste the debug log in the post. Also, if your proxy has "toc" in the name you'll be in trouble, some fixing in PROLA.js line 4 is needed.

To guide me in the subdomains to add I used http://prola.aps.org/browse.html but I'm still getting errors in some cases, even if the URL modification scheme used in the translator would appear to work from a glance at the export Endnote link. In the next post i attach a debug log from trying prst-per, it also fails on
http://prola.aps.org/abstract/PRI/v7/i4/p193_1

I've done very very little testing. Hope this can be used to improve the translator, at least it could be brought to work for prl, prc, pra and several others, even if it doesn't work through proxies. Perhaps adjust the regex to match the succesful cases.

npj · January 18, 2010

Debug log of a failed import:

(4)(+0000000): Translate: Parsing code for PROLA
(3)(+0000003): created hidden browser (1)
(3)(+0000000): loading http://prst-per.aps.org/abstract/PRSTPER/v4/i2/e020002
(3)(+0000383): http://prst-per.aps.org/abstract/PRSTPER/v4/i2/e020002 has been loaded
(4)(+0000001): Translate: Phys. Rev. ST Phys. Educ. Res. 4, 020002 (2008): Editorial: Physics - spotlighting exceptional research
(3)(+0000004): deleted hidden browser
(2)(+0000001): Translate: Translation using PROLA failed:
message => newDoc.evaluate("//div[contains(@class, \"aps-abstractbox\")]/p", newDoc, null, XPathResult.ANY_TYPE, null).iterateNext() is null
fileName => chrome://zotero/content/xpcom/translate.js
lineNumber => 816
stack => ([object XPCNativeWrapper])@chrome://zotero/content/xpcom/translate.js:816
name => TypeError
url => http://prst-per.aps.org/abstract/PRSTPER/v4/i2/e020002
downloadAssociatedFiles => true
automaticSnapshots => true
(3)(+0000026): HTTP POST id=2c310a37-a4dd-48d2-82c9-bd29c53c1c76&lastUpdated=2009-01-18%2023%3A15%3A00&diagnostic=version%20%3D%3E%202.0b7.6%2C%20platform%20%3D%3E%20Win32%2C%20oscpu%20%3D%3E%20Windows%20NT%205.1%2C%20locale%20%3D%3E%20%2C%20appName%20%3D%3E%20Firefox%2C%20appVersion%20%3D%3E%203.5.7%2C%20extensions%... (1489 chars) to http://www.zotero.org/repo/report
(5)(+0000001): Translate: running handler 0 for done
(5)(+0004595): SELECT COUNT(*) FROM fulltextItems WHERE (indexedPages IS NOT NULL AND indexedPages=totalPages) OR (indexedChars IS NOT NULL AND indexedChars=totalChars)
(5)(+0000000): SELECT COUNT(*) FROM fulltextItems WHERE (indexedPages IS NOT NULL AND indexedPages<totalPages) OR (indexedChars IS NOT NULL AND indexedChars<totalChars)
(5)(+0000000): SELECT COUNT(*) FROM itemAttachments WHERE itemID NOT IN (SELECT itemID FROM fulltextItems WHERE indexedPages IS NOT NULL OR indexedChars IS NOT NULL)
(5)(+0000001): SELECT COUNT(*) FROM fulltextWords
(3)(+0000065): DATE: retrieved with algorithms: ({year:2009, month:8, day:28})
(3)(+0000001): DATE: retrieved with algorithms: ({year:2008, month:4, day:13})
(3)(+0000001): DATE: retrieved with algorithms: ({year:2008, month:4, day:13})
(3)(+0000000): DATE: retrieved with algorithms: ({year:2009, month:7, day:7})
(3)(+0000001): DATE: retrieved with algorithms: ({year:2009, month:8, day:13})
(3)(+0000000): DATE: retrieved with algorithms: ({year:2008, month:8, day:16})
(4)(+0000006): Translate: Binding sandbox to http://www.example.com/
(3)(+0000000): Translate: Searching for translators for an undisclosed location
(4)(+0000000): Translate: Parsing code for Zotero RDF
(4)(+0000003): Translate: Setting configure option getCollections to true
(4)(+0000000): Translate: Setting configure option dataMode to rdf
(4)(+0000000): Translate: Setting display option exportNotes to true
(4)(+0000000): Translate: Setting display option exportFileData to false
(4)(+0000001): Translate: Parsing code for MODS
(4)(+0000002): Translate: Setting display option exportNotes to true
(4)(+0000000): Translate: Setting configure option dataMode to xml/e4x
(4)(+0000001): Translate: Parsing code for Refer/BibIX
(4)(+0000001): Translate: Setting configure option dataMode to line
(4)(+0000000): Translate: Setting display option exportCharset to UTF-8
(4)(+0000001): Translate: Parsing code for RIS
(4)(+0000002): Translate: Setting configure option dataMode to line
(4)(+0000000): Translate: Setting display option exportNotes to true
(4)(+0000000): Translate: Setting display option exportCharset to UTF-8
(4)(+0000000): Translate: Parsing code for Unqualified Dublin Core RDF
(4)(+0000001): Translate: Setting configure option dataMode to rdf
(4)(+0000000): Translate: Parsing code for Wikipedia Citation Templates
(4)(+0000002): Translate: Setting display option exportCharset to UTF-8
(4)(+0000000): Translate: Parsing code for BibTeX
(4)(+0000008): Translate: Setting configure option dataMode to block
(4)(+0000000): Translate: Setting display option exportCharset to UTF-8
(5)(+0000004): SELECT key AS domainPath, value AS format FROM settings WHERE setting='quickCopySite' ORDER BY domainPath COLLATE NOCASE

The RIS I get on a manual DL looks ok:
TY - JOUR
M1 - Copyright (C) 2010 The American Physical Society
M1 - Please report any problems to prola@aps.org
ID - 10.1103/PhysRevSTPER.4.020002
TI - Editorial: Physics - spotlighting exceptional research
A1 - Sprouse, Gene D.
VL - 4
IS - 2
PB - American Physical Society
SP - 020002
PY - 2008/09/15/
JF - Physical Review Special Topics - Physics Education Research
JA - Phys. Rev. ST Phys. Educ. Res.
J1 - PRSTPER
UR - http://link.aps.org/abstract/PRSTPER/v4/e020002
ER -

npj · January 18, 2010

Debug log of an import that works on direct connection (from IP with subscription), but fails on proxying:

(4)(+0033029): Translate: Parsing code for PROLA
(3)(+0000004): created hidden browser (1)
(3)(+0000000): loading http://prl.aps.org.myproxy.example.net/abstract/PRL/v104/i2/e026801
(3)(+0000817): http://prl.aps.org.myproxy.example.net/abstract/PRL/v104/i2/e026801 has been loaded
(4)(+0000001): Translate: Phys. Rev. Lett. 104, 026801 (2010): Carbon Nanotubes as Cooper-Pair Beam Splitters
(3)(+0000001): HTTP POST type=ris to http://prl.aps.org.myproxy.example.net/export/PRL/v104/i2/e026801?type=ris
(3)(+0000007): deleted hidden browser
(3)(+0000000): Translate: Translation successful
(5)(+0000000): Translate: running handler 0 for done
(4)(+0000117): Translate: Binding sandbox to http://www.example.com/
[Note: Above line not edited by me -npj]
(4)(+0000001): Translate: Parsing code for RIS
(4)(+0000004): Translate: Setting configure option dataMode to line
(4)(+0000000): Translate: Setting display option exportNotes to true
(4)(+0000000): Translate: Setting display option exportCharset to UTF-8
(5)(+0003802): SELECT COUNT(*) FROM fulltextItems WHERE (indexedPages IS NOT NULL AND indexedPages=totalPages) OR (indexedChars IS NOT NULL AND indexedChars=totalChars)
(5)(+0000001): SELECT COUNT(*) FROM fulltextItems WHERE (indexedPages IS NOT NULL AND indexedPages<totalPages) OR (indexedChars IS NOT NULL AND indexedChars<totalChars)
(5)(+0000000): SELECT COUNT(*) FROM itemAttachments WHERE itemID NOT IN (SELECT itemID FROM fulltextItems WHERE indexedPages IS NOT NULL OR indexedChars IS NOT NULL)
(5)(+0000001): SELECT COUNT(*) FROM fulltextWords
(3)(+0000067): DATE: retrieved with algorithms: ({year:2009, month:8, day:28})
(3)(+0000002): DATE: retrieved with algorithms: ({year:2008, month:4, day:13})
(3)(+0000000): DATE: retrieved with algorithms: ({year:2008, month:4, day:13})
(3)(+0000000): DATE: retrieved with algorithms: ({year:2009, month:7, day:7})
(3)(+0000001): DATE: retrieved with algorithms: ({year:2009, month:8, day:13})
(3)(+0000000): DATE: retrieved with algorithms: ({year:2008, month:8, day:16})
(4)(+0000006): Translate: Binding sandbox to http://www.example.com/
(3)(+0000000): Translate: Searching for translators for an undisclosed location
(4)(+0000000): Translate: Parsing code for Zotero RDF
(4)(+0000003): Translate: Setting configure option getCollections to true
(4)(+0000000): Translate: Setting configure option dataMode to rdf
(4)(+0000000): Translate: Setting display option exportNotes to true
(4)(+0000000): Translate: Setting display option exportFileData to false
(4)(+0000001): Translate: Parsing code for MODS
(4)(+0000002): Translate: Setting display option exportNotes to true
(4)(+0000000): Translate: Setting configure option dataMode to xml/e4x
(4)(+0000001): Translate: Parsing code for Refer/BibIX
(4)(+0000001): Translate: Setting configure option dataMode to line
(4)(+0000000): Translate: Setting display option exportCharset to UTF-8
(4)(+0000001): Translate: Parsing code for RIS
(4)(+0000002): Translate: Setting configure option dataMode to line
(4)(+0000000): Translate: Setting display option exportNotes to true
(4)(+0000000): Translate: Setting display option exportCharset to UTF-8
(4)(+0000000): Translate: Parsing code for Unqualified Dublin Core RDF
(4)(+0000001): Translate: Setting configure option dataMode to rdf
(4)(+0000000): Translate: Parsing code for Wikipedia Citation Templates
(4)(+0000002): Translate: Setting display option exportCharset to UTF-8
(4)(+0000000): Translate: Parsing code for BibTeX
(4)(+0000009): Translate: Setting configure option dataMode to block
(4)(+0000000): Translate: Setting display option exportCharset to UTF-8
(5)(+0000005): SELECT key AS domainPath, value AS format FROM settings WHERE setting='quickCopySite' ORDER BY domainPath COLLATE NOCASE

noksagt · January 18, 2010

In the next post i attach a debug log from trying prst-per, it also fails on
http://prola.aps.org/abstract/PRI/v7/i4/p193_1

Both of these examples lack an actual abstract that would be contained in the div of class='aps-abstractbox'. So, I think the PROLA.js translator just needs to be modified to check to see that an abstract is actually present. See, e.g. the ACM or IngentaConnect translators as to how to do this.

npj · January 19, 2010

I updated PROLA.js, code is in next comment. I have done very little testing, but I believe it works:
- through proxies (at least for my case, but see note 1)
- for all parts of aps.org (pra, prb, etc etc)
- even if the article has no online abstract
- on Table of Contents pages for issues of the journals
- on "Citing articles" tabs (only for prola articles, see note 2)

The code is in my next post.

Note 1: I disallow third-party-cookies. This blocks cookies from my proxy, so I had to add an explicit allow for that domain, the translator was ok.

Note 2: This feature works only for the PROLA articles. Some other articles may show up in the list without title, but selecting them is unlikely to work. If someone sets up a DOI finder or similar that can scrape the whole page, this translator might get in the way. But for now, it's either this, or no multi-import at all, so I vote for keeping it. This approach did not work for the "References" tab.

npj · January 19, 2010

Updated PROLA.js, FYI. I can see the tabs are messed up here, I'll e-mail a copy to translators@zotero.

{
"translatorID":"2c310a37-a4dd-48d2-82c9-bd29c53c1c76",
"translatorType":4,
"label":"PROLA","creator":"Eugeniy Mikhailov and Michael Berkowitz",
"target":"https?://(?:www\\.)?(prola|prl|prb|rmp|pra|prc|prd|pre|prst-ab|prst-per|).aps.org/(toc|forward|searchabstract|abstract)/",
"minVersion":"1.0.0b3.r1",
"maxVersion":null,
"priority":100,
"inRepository":true,
"lastUpdated":"2009-12-26 23:15:00"
}

function detectWeb(doc, url) {
// toc indicates table of contents, forward is a "Citing articles" page
if (/\/toc\//.test(url) || (/\/forward\//.test(url))){
return "multiple";
} else {
return "journalArticle";
}
}

function doWeb(doc, url) {
var arts = new Array();
if (detectWeb(doc, url) == "multiple") {
var items = Zotero.Utilities.getItemArray(doc, doc, "(abstract|abstractsearch)");
items = Zotero.selectItems(items);
for (var i in items) {
arts.push(i);
}
} else {
arts = [url];
}

Zotero.Utilities.processDocuments(arts, function(newDoc) {
Zotero.debug(newDoc.title);
if (newDoc.evaluate('//div[contains(@class, "aps-abstractbox")]/p', newDoc, null, XPathResult.ANY_TYPE, null).iterateNext()) var abs = Zotero.Utilities.trimInternal(newDoc.evaluate('//div[contains(@class, "aps-abstractbox")]/p', newDoc, null, XPathResult.ANY_TYPE, null).iterateNext().textContent);
var urlRIS = newDoc.location.href;
// so far several more or less identical url possible
// one is with "abstract" other with "searchabstract"
urlRIS = urlRIS.replace(/(searchabstract|abstract)/,"export");
var post = "type=ris";
var snapurl = newDoc.location.href;
var pdfurl = snapurl.replace(/(searchabstract|abstract)/, "pdf");
Zotero.Utilities.HTTP.doPost(urlRIS, post, function(text) {
// load translator for RIS
var translator = Zotero.loadTranslator("import");
translator.setTranslator("32d59d2d-b65a-4da4-b0a3-bdd3cfb979e7");
translator.setString(text);
translator.setHandler("itemDone", function(obj, item) {
if (item.itemID) {
item.DOI = item.itemID;
}
item.attachments = [
{url:snapurl, title:"PROLA Snapshot", mimeType:"text/html"},
{url:pdfurl, title:"PROLA Full Text PDF", mimeType:"application/pdf"}
];
if (abs) item.abstractNote = abs;
item.complete();
});
translator.translate();
}, null, 'latin1');
}, function() {Zotero.done();});
Zotero.wait();
}

npj · January 19, 2010

Oh, one more thing: The translator could be polished a little, if it didn't try to attach a pdf when the user is not logged in to PROLA. At the moment it saves a html page as "PROLA full text", luckily with the HTML mime type, so I can see not to bother my Acrobat plugin with it.

Any suggestions on how to fix this nicely?

PS: Just found out about the nice debug log submission tool. Sorry for the unnecessary flooding of the thread :-/

hagver · January 19, 2010

The problem persists, unfortunately:
It seems that for all of Physical Review (which is not exactly a little journal on the sidelines), import is broken: no abstract, no page number (recently called article number in Phys.Rev.), no pdf-import.
Similar for Reviews of Modern Physics.

This problem has been known and persisted for a long time now.
It means that Zotero is broken for the biggest journals in Physics.
Completely unacceptable.

PLEASE fix this bug.

Thank you !

noksagt · January 19, 2010

npj just posted a suggestion for the updated translator a few hours ago. His suggested changes make it work for Phys Rev for me. It has obviously not been pushed out to clients yet. Did you update your translator manually & find that it doesn't work? If so, please list URIs. If not, you are probably relying on the DOI translator (since the stock PROLA translator doesn't recognize the URI scheme for some of their journals: which is one issue that npj improved). In that case, please just have a bit more patience for the user-contributed improvement to be tested and deployed.

hagver · January 21, 2010

Fantastic ! It works !

A thousand times "Thank You" to npj !

Note - When writing my previous post, I had not recognized npj's
contribution as a complete working new translator for Phys. Rev..
Also thought that PROLA was only for old archived Phys. Rev..
Thanks to noksagt for pointing this out.

Now let's hope that the new translator covers all cases and gets deployed soon.

Thank you !

alexHH · January 22, 2010

Hi,

I'm not so familiar with zotero. Can anybody give me instructions how to manually update the PROLA translator to the version posted above?

Thank's.

dstillman · January 22, 2010

Updated PROLA.js, FYI. I can see the tabs are messed up here, I'll e-mail a copy to translators@zotero.

Post it as a file to zotero-dev and then send a message to the list.

tkott · January 23, 2010

Thanks npj for doing the heavy lifting, this solved my problem as well: http://forums.zotero.org/discussion/10733/ff36-possible-bad-reading-of-doi/#Item_1

Tomek

levbishop · January 23, 2010

@AlexHH I too spent some time trying to figure out how to solve this problem using npj's method. So far as to find my firefox profile, which on windows lives under
%APPDATA%\Mozilla\Firefox\Profiles\
and I found that zotero.jar lives under this directory in:
extensions\zotero@chnm.gmu.edu\chrome
and that zotero.jar can be decompressed using, for example 7-zip (download from http://www.7-zip.org )
I tried to put npj's PROLA.js in various places within the resulting directory tree, rezipping into zotero.jar, and restarting firefox, but nothing seemed to work.

I couldn't find any documentation on the zotero website about where translators should be saved, nor could I find any of the translators which I must already have. Perhaps they are stored somewhere other than zotero.jar, but I have no idea where....

fbennett · January 23, 2010

Navigating from the directory where zotero.jar is located, you should find the translators in:../../../zotero/translators/which I guess on Windows is:..\..\..\zotero\translators\They're not zipped or anything, you can just drop the new version into that directory. On Windows, you may need to restart Firefox for the new translator version to take effect, I'm not sure.

(EDIT: This applies to Zotero 2.0 only.)

levbishop · January 23, 2010

@fbennet Thanks, but in the ..\..\..\zotero directory all I see is
pdfinfo-Win32.exe*
pdfinfo-Win32.exe.version*
pdftotext-Win32.exe*
pdftotext-Win32.exe.version*
storage/
zotero.sqlite*
zotero.sqlite.bak*

dstillman · January 23, 2010

Translators go in the 'translators' directory within the Zotero data directory.

This applies to Zotero 2.0 only. Translators are much harder to access in Zotero 1.0.