[SOLVED] PDFs Not importing, and I want to fix it

wolf29 · March 8, 2015

When I import from the NSU proxy to ebsco host, the PDF is not brought down. This is because the standard behavior when the link is clicked (on Chrome or Firefox, on Win 8.1) is to open it in the web file viewer. If you right-click on the full-text link, and save as, it wants to save as pdfviewer.htm. If you click on the full-text link, you can see the pdf in the web browser. Once there, you can right-click the document window and save as default name ContentServer.pdf and save it as whatever you want. It looks like the default behavior of Zotero is to assume the full-text link is the real PDF. I have seen discussion threads that suggest that.
I want to write a little connector that could get Zotero's downloader to trace down the file chain until it finds something called a *.pdf and attach the doc name or authors. I usually save the pdfs by the authors' names and date, but if it were internal to Zotero, they could be saved as any sempti-unique name format. My question is where in the source is the code that fetches the file when you click the Zotero icon in the address bar on the browser.
I am not whining for somebody to fix it for me, just for a pointer to a logical place to make the attempt, in my spare time while I work on my comprehensive exams.

aurimas · March 8, 2015

What's the URL you're trying to import from? Are you using Zotero Standalone with Chrome/Safari/Firefox? This should not be different across different OSes, so it should be working in general, but importing PDFs over proxy via connectors is currently broken (in some cases).

wolf29 · March 8, 2015

http://eds.b.ebscohost.com.proxy1.ncu.edu/eds/pdfviewer/pdfviewer?sid=529579e2-83fa-46d7-a495-16a479f9a543%40sessionmgr198&vid=5&hid=119

http://eds.b.ebscohost.com.proxy1.ncu.edu/eds/detail/detail?sid=529579e2-83fa-46d7-a495-16a479f9a543%40sessionmgr198&vid=6&hid=119&bdata=JnNpdGU9ZWRzLWxpdmU%3d#db=bth&AN=83465896

I thought it was broken but only when the data-store on the web was doing squirrelly things like pushing pdfs with an .asp extension instead of a .pdf extension.

aurimas · March 8, 2015

Can you import the PDF via URL bar icon from this page: http://eds.b.ebscohost.com/eds/pdfviewer/pdfviewer?sid=529579e2-83fa-46d7-a495-16a479f9a543@sessionmgr198&vid=5&hid=119 (no proxy)? This works for me on Chrome + Zotero Standalone (can't check that URL over proxy, unfortunately). If that's not working (and you can see the PDF in the browser), please provide a Debug log from Zotero Standalone (or Zotero for Firefox, if that's what you're using) covering the import attempt. https://www.zotero.org/support/debug_output

adamsmith · March 8, 2015

@aurimas - I was pretty sure PDFs over Proxy have never worked from EBSCO via connectors. Is that not right?

aurimas · March 8, 2015

Yeah, I think you're right, but I can't really get EBSCO to work over proxy right now.

adamsmith · March 8, 2015

If you click on the full-text link, you can see the pdf in the web browser. Once there, you can right-click the document window and save as default name ContentServer.pdf and save it as whatever you want. It looks like the default behavior of Zotero is to assume the full-text link is the real PDF. I have seen discussion threads that suggest that.
I want to write a little connector that could get Zotero's downloader to trace down the file chain until it finds something called a *.pdf and attach the doc name or authors

Just to be clear, though: Zotero does not assume this and does, in fact, already try to grab the PDF from the embedded frame in EBSCO and similar webpages.

My guess would be that you're trying this using Standalone--on several of the more complex sites, most notably ProQuest and EBSCO, you don't get PDFs attached when using Standalone and accessing resources through a proxy. If you're in that situation, either switch to the Firefox version, or see if your university allows you to connect from off-campus via VPN.

wolf29 · March 8, 2015

That is a useful hint.
I tried finding the resource at http://eds.a.ebscohost.com.proxy1.ncu.edu/eds/detail/detail?sid=9c8b1d95-26ae-4ea2-ab89-aaff3181f076%40sessionmgr4002&vid=0&hid=4202&bdata=JnNpdGU9ZWRzLWxpdmU%3d#db=bth&AN=95589398
and it failed to download the pdf.
I went back and removed .proxy1.ncu.edu from the URL and it downloaded the PDF.
I am using the standalone version because I was having memory starvation problems using firefox, and chrome was better at releasing memory back to the system.

Thanks!