Chrome import of Informit database articles not capturing information properly

andy.j.campbell · August 15, 2013

I am currently using Informit as a primary database for my research, however I am having problems importing citations directly from my browser.

When I try to save a PDF of an article, it Zotero does not capture the names of the Authors, the correct title of the article in addition to other info such as page numbers etc etc.

Is there some way of doing this without having to import the citation into EndNote then exporting it to Zotero?

Gracile · August 15, 2013

There's no specific translator, at the moment, for this site. Zotero relies on the generic "Embedded metadata" translator but unfortunately the metadata embedded in their webpages are useless.

adamsmith · August 17, 2013

what's a sample URL?

andy.j.campbell · August 19, 2013

It's hard to paste an example URL as, when you put it in your browser it comes up with an error saying no database was selected.

Try going here and searching for anything instead:

http://search.informit.com.au/search;res=IELHEA

adamsmith · August 19, 2013

cool, I'll have a look, that looks doable, but not 100% sure how easily. Won't be very soon, though.

andy.j.campbell · August 20, 2013

That would be great, thanks! Especially since Informit is one of the major databases I need to use for my research.

adamsmith · September 12, 2013

A translator is now up. Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences.

It will work on search results and "complete record" pages. Let me know if you have any problems or suggestions.

andy.j.campbell · September 12, 2013

Thank you so much Adam- this is extremely helpful. Importing the citation from the "complete record" page works well, but the PDF fails to download despite me being logged in and having access to the PDF if I click on it.

adamsmith · September 12, 2013

yeah, I don't have access to that, so couldn't do it. Do you just get to the PDF in one click from the complete record page?

If so, could you right-click (in Chrome) on the PDF link, select "Inspect Element" and take a screenshot of the panel that opens up, put it up somewhere (e.g. imgur.com) and link to it from here. If you have to go through several steps, what exactly are they?

andy.j.campbell · September 13, 2013

http://imgur.com/x5ym0Kd

http://imgur.com/WPXTY3c

It's only one click to the pdf from the 'complete record' page. I also took a photo of how the PDF looks when you open it as it is contained in a frame.

adamsmith · September 13, 2013

OK, I need the page source for the iframe page, unfortunately. You can get it by pressing ctrl+u when your on the page (cmd+u on mac, I believe). Copy&paste the full page source to the window at gist.github.com, create public gist at the bottom of the page and post the URL here.

andy.j.campbell · September 13, 2013

Here you are:

https://gist.github.com/anonymous/6557712

adamsmith · September 14, 2013

thanks. Please help me test this:
Download the translator from https://gist.github.com/adam3smith/6559173/raw/46720e3d6fcaf7adc0bf16d77e6585bbab6d1d17/Informit+Australia.js
(right-click--> save link as)
and copy the file into the translator folder in your Zotero data directory http://www.zotero.org/support/zotero_data replacing the existing file of the same name.
Restart Chrome and Zotero and try this out.
If it does work, just let me know and I'll push this out.
If it doesn't work, create debug output for trying to import an item with pdf from informit from chrome as described in step 1-3 here:
http://www.zotero.org/support/debug_output#zotero_connectors_chrome_and_safari
but don't send it to Zotero (as described in 4-6), but instead search the output for the line containing "PDF iframe URL:" and post it here. It's not 100% sure that it will be there, if you don't find it that's important information, too.

andy.j.campbell · September 14, 2013

"PDF iframe URL:" is not in the output

adamsmith · September 15, 2013

OK, thanks for bearing with me - not that easy to do this quasi-blind.
Try again with this version:
https://gist.github.com/adam3smith/6559173/raw/721da29611737cf4ba75207ea15ae172cfb9bb67/Informit+Australia.js

I suspect it won't work, but regardless of whether it does or not, could you take the entire debug output and copy it to a public gist as above? There is nothing security relevant in the debug, but if you do feel uncomfortable putting it up, you can also send it to me via e-mail - my e-mail is at the bottom of this blogpost: http://www.zotero.org/blog/summer-zotero-workshops/

andy.j.campbell · September 16, 2013

The button that usually shows when you can save a citation no longer appears in Informit. It's still present on PubMed.

adamsmith · September 16, 2013

oh - I'm sorry I didn't see that the filename saves with a +:
Delete all versions of "Informit+Australia.js" from the translator directory, then replace the current version of "Informit Australia.js" with the file downloaded from the link above (i.e. replace the + with a space)

andy.j.campbell · September 16, 2013

I just did this but the button still isn't there. I ensured I restarted both Chrome and Zotero after I made the change.

adamsmith · September 16, 2013

odd. Definitely works for me. Try resetting translators from the advanced tab of the Zotero preferences & restart. Things should definitely work again. Then repeat replacing the translator with the one from the link above (replacing the + with space again - when saving to the translator directory you should get a message that you're replacing a file).

andy.j.campbell · September 16, 2013

The button reappeared but it didn't even attempt to save the PDF this time. Here is the debug: https://gist.github.com/anonymous/6587738

adamsmith · September 16, 2013

getting there...
try with this:
https://gist.github.com/adam3smith/6559173/raw/da5c103c435975770e86625b29aae9632f7bf79f/Informit+Australia.js

same instructions as above - we'll likely need one more debug before I'll get it to work (or decide it's impossible). If you have Firefox with Zotero on your computer, it might be worth a try with that.

aurimas · September 16, 2013

@adamsmith
/<iframe.+src=\"(.+\.pdf)\"/
should be safer as
/<iframe[^>]+src="([^"]+\.pdf)"/

aurimas · September 16, 2013

There's also this (from the debug log):

(3)(+0000001): Connector: Method saveItems failed with status 0

Not that it's related to this PDF issue ~~(well, actually it could be, because I don't think saving attachments directly to server works)~~, but it looks like saving to Zotero Standalone is failing.

@andy.j.campbell, did you perhaps have Zotero Standalone open and then closed it right before clicking URL bar icon? I'm just curious if there are additional issues with Zotero on your system

Edit: @andy.j.campbell, nvm, it just seems that you don't have Zotero Standalone open. You should have it open for these tests. (if you do, there's some problem that's blocking Zotero Chrome extension from communicating with it)

@Dan, is there any reason why saveItems should even be attempted with Zotero Standalone after getSelectedCollection fails with status 0 (or fails with some other status)? (see https://gist.github.com/anonymous/6587738)

andy.j.campbell · September 17, 2013

I actually always have the standalone open since, when I'm saving a citation, I cite it directly into word soon after.

It didn't work, but here is the debug: https://gist.github.com/anonymous/6601788

Thank you all so much for your time and effort!

adamsmith · September 17, 2013

@aurimas - the version the first debug came from had compatibility set to g only, that's why nothing useful is in there, no reason to worry about that.

@andy - hmm that should work.
see this line?

HTTP POST {"items":[{"itemType":"journalArticle","creators":[{"firstName":"Madeline","lastName":"Thompson","creatorType":"author"}],"notes":[],"tags":["Subject(s): Pharmacists","Homocysteine","Cardiovascular diseases in old age","Immunological tolerance"],"seeAlso":[],"attachments":[{"title":"informit Snapshot","mimeType":"text/html","url":"http://search.informit.com.au.dbgw.lis.curtin.edu.au/search;subject=Health;action=showCompleteRec;rs=1;rec=2"},{"url":"http://search.informit.com.au.dbgw.lis.curtin.edu.au/MTA1NzIyMzMuMzUxOTkw/elibrary//AUSPHA/2013_v032n04/AusPha2013V032N04_058.pdf","title":"informit Full Text PDF","mimeType":"application/pdf"}],"title":"An intriguing case of unexplained hypertension","publicationTitle":"Australian Pharmacist","volume":"32","issue":"4","page":"58-62","date":"Apr 2013","ISSN":"0728-4632","abstractNote":"Pharmacists routinely use investigative questioning in their pharmacy practice. Medication mystery scenarios are commonly occurring situations where detailed questioning uncovers a c... (2396 chars) to http://127.0.0.1:23119/connector/saveItems

that's the item that Zotero is saving.
Here's the pdf fulltext attachment:
{"url":"http://search.informit.com.au.dbgw.lis.curtin.edu.au/MTA1NzIyMzMuMzUxOTkw/elibrary//AUSPHA/2013_v032n04/AusPha2013V032N04_058.pdf"

If you try that URL now, it probably won't work, since it's tied to a session, but could you do this again, find that URL in the debug(the weird MTA1N... part is likely going to be different) and just try to open it. What's the URL? Does that open the PDF?

@aurimas - any other ideas? (ignore the "Translate: PDF iframe URL: T" line, that's just a mispecified match for the debug output).

aurimas · September 17, 2013

the version the first debug came from had compatibility set to g only, that's why nothing useful is in there, no reason to worry about that.

Right, but the embedded metadata translator still could not communicate with standalone.

any other ideas?

Debug output from standalone would be helpful. (The instructions are at the top of the page where Chrome Debug Log instructions were)

It's possible that the doGet request is actually being redirected to a different domain, so the URL for the PDF could be incorrect. @andy, What's the URL exactly when you're looking at the PDF? (Edit: Looks like this is probably not the case)

Edit: BTW, looking through some older posts, I came across this

@adamsmith, I figured it out in case you're working on it. We werent' unescaping HTML special entities in the URL before using it. Will push it out in a minute.

It doesn't affect the current issue, but we'll probably want to do this anyway.

@adamsmith, I'm trying to find some original discussion about proxies and PDF downloads (failing) via connectors. I don't recall the reason for that (besides the PDFs being hosted on different subdomains). Do you know anything about this or recall a discussion about it?

adamsmith · September 17, 2013

The only thing I vaguely remember that wasn't cross-domain restrictions was some odd superimposed over the pdf. Highly doubt that's the case here.

andy.j.campbell · September 18, 2013

I think I might have found the problem-

When I paste the url zotero is trying to download the pdf of from, I get a page saying "Informit - Forbidden: Direct file requests are not allowed."

adamsmith · September 18, 2013

yeah, sorry, I don't think we can do this then. I've never seen that before.

aurimas · September 18, 2013

This probably relies on referrer header for the GET request, which we cannot currently set for doGet or fetching attachments. If these sort of restrictions become more common, we might have to implement the functionality. I think I've seen this before, but I can't recall which site.