Standalone failing to import PDFs (ProQuest, NYT)

qenghis · October 2, 2012

Hello, my Zotero standalone app is no longer able to import PDFs from ProQuest or New York Times archives.

When I import a citation, I briefly see a PDF icon titled "Full Text PDF" under the new citation, but it disappears almost instantly. It's been happening on every attempt for the past few days, but here is one example: http://search.proquest.com.ezproxy.cul.columbia.edu/docview/173221843/13987F5ADAE3C46605A/1

I'm using Zotero standalone 3.0.8 with Chrome 21.0.1180.90 on Mac 10.5.8.

PS I also tried it with the Firefox plugin. It works correctly with FF.

adamsmith · October 2, 2012

could you confirm that other proxied resources - say JSTOR - still import PDFs into your Standalone?

qenghis · October 2, 2012

JSTOR is working.

qenghis · October 17, 2012

Hi Adam, I'm just wondering if there's any status update on this.

Thanks,
Michael

adamsmith · October 17, 2012

no, I have this bookmarked, but as I believe I mentioned elsewhere, troubleshooting connector-only issues is a good deal harder so it will take a while unless someone else takes this up.

qenghis · October 17, 2012

Ok, that's a little frustrating. Is there anything I can do to help? I have a software background.

adamsmith · October 17, 2012

PDF download/attachment via proxy&standalone is going to be more fragile until Zotero has full proxy support in the Standalone connectors, which I don't think is anywhere close. If you need this to work reliably for sites like EBSCO and Proquest the only recommendation I have is to use Firefox.
For most other sites with simpler structures connectors work just fine.

If you want to look at this yourself you can try to compare the debug output for Standalone and for Zotero for Firefox:
http://www.zotero.org/support/debug_output

Some notes on debugging in Chrome directly are here:
https://groups.google.com/forum/?fromgroups=#!searchin/zotero-dev/debug$20connector/zotero-dev/pRw2jv7JIGs/TA9Lpb4oy-EJ

qenghis · October 18, 2012

Bummer. FF sucks. I try not to use it anymore.

I'll take a stab at debugging as soon as I get a chance.

qenghis · October 29, 2012

I took a look at the debugging logs. One obvious difference are these two lines in the stand-alone log:

(2)(+0000002): Downloaded PDF did not have MIME type 'application/pdf' in Attachments.importFromURL()

(3)(+0000000): Deleting item 5031

If I'm reading this right, it looks like the file gets downloaded, but the app doesn't recognize it and deletes it. That jives with the behavior I'm seeing where the PDF attachment appears in Zotero for an instant and then goes away.

If I'm on the wrong track here, could you please point me in the right direction?

I've also submitted the logs to the server...
Stand-alone log: D865221731
Firefox log: D327996334
Source URL: http://search.proquest.com.ezproxy.cul.columbia.edu/docview/173296630/13A0F8314BC595DAA9A/13

Thanks a lot,
Michael

adamsmith · October 29, 2012

you're on the right track, but my suspicion (and the cause of this debug output in all cases that I have seen it) would be that the problem isn't that Zotero isn't recognizing the file/mime type correctly, but that it's indeed not downloading a PDF, but some regular webpage, most likely an "access not allowed" or so, because the proxy (and thus access to the full text) gets lost along the way when using Standalone.
You could adjust the translator by deleting the mimeType from the attachment setting and see what you get, maybe that'll tell you more.

If I'm right, there is a fair chance that this just won't work with Standalone and proxy because of such authorization issues. As an alternative to using Firefox, you could also look into connecting via VPN:
http://library.columbia.edu/services/faq/eresources/vpn.html
instead of ezproxy - there's a very good chance that'd work.

qenghis · October 29, 2012

I commented out lines 115 and 411 (mimeType: 'application/pdf') in ProQuest.js and restarted zotero but saw no differences in the behavior or in the log file. Is there something else I should do to delete the mimeType from the attachment setting?

I also set up a VPN connection to Columbia. Again, there were no differences.

Thanks.

qenghis · October 30, 2012

Now Firefox isn't working either. ProQuest pages show the webpage icon in the address bar, not newspaper or magazine icons, and when I save, the pdfs aren't downloaded.

I assume that I should create a new thread for this?

PS I love zotero, but the instability is really starting to get to me.

adamsmith · October 30, 2012

I have PDF attachments consistently working for me on Firefox, e.g. for the "Senate No Place for Lazy Man" article: http://search.proquest.com/docview/173296630

You'll get the generic article logo on Proquest if you're on the page view page rather than the citation/abstract view, since Zotero uses information on the latter to detect the item type, but the actual import should have the right item type. AFAIK that has always been the case.

Did you try resetting and updating your translators in case your attempts to debug/fix this for Standalone changed anything?

qenghis · October 30, 2012

Ah yes, that was it, thank you. I reset the translators, and it's working fine. I apologize for my error and for my outburst.

Do please let me know if there is anything more that I can do to help debug the stand-alone.

Regards,
Michael

magpie212 · October 30, 2012

I have observed that saving embedded metadata to Zotero is not always available, and also that when it is available, it imports the data differently (and more accurately) than if I use the "Create New Item from Current Page" button. I would like to 1) understand this process better and 2) figure out the best way to get data imported as accurately as possible. I understand that I'll always have to edit some, but I'd sure like to keep it to a minimum. Thanks.

adamsmith · October 30, 2012

Please don't post in unrelated threads. We try to keep one thread to one topic - this being on PDF import from Proquest.

The short answer is that it's always better to use the item in the URL bar than the "Create New Item from Current Page" and that that's also the reason for keeping the two functionalities physically separate.
You'll get embedded metadata when there is embedded metadata on the page that Zotero detects. For anything beyond that, please start a new discussion.