Downloading PDFs broken?

adamgolding · March 6, 2012

I was previous able to auto-download pdfs and can no longer, both using:

1. chrome to grab for standalone
2. the ff plugin to grab for a separate database (not the standalone db)

I have disabled all extraneous add-ons, and "automatically attach PDFs" is enabled in both versions of zotero. I was previously able to auto-download the specifc PDF I am testing with, and it has th [PDF] prefix in google scholar. both of the proxies I use show up in the proxy redirection list and proxy redirection is enabled. I am howver, now on the campus internet, rather than at home...

adamgolding · March 6, 2012

Oh, and I am testing with this article:

http://scholar.google.ca/scholar?q=%22generating+music+through+genetic+algorithms%22&hl=en&btnG=Search&as_sdt=1%2C5&as_sdtp=on

adamsmith · March 6, 2012

have you tried other sites? This might well be google scholar specific - they do a weird re-direction thing with PDFs - there has been talk of hard-coding that into Zotero, but I'm not sure if that's been done.
Try this (open access):
http://asp.eurasipjournals.com/content/2012/1/57/abstract

adamgolding · March 6, 2012

Ah! That one works, although, my entire goal here is to grab pdfs via google scholar, and i was previously able to grab that one *via* google scholar, so I'm still not sure what's going on..

adamsmith · March 6, 2012

google often changes small things under the hood even when the look remains the same - I'm pretty sure that's all there is to the change. You can observe this if you right-click on the pdf link in google scholar and select Zotero---> save link to Zotero:
That won't work, though it will with typical pdf links.

dstillman · March 6, 2012

It's not Google—it's the other site. Zotero makes a quick HEAD request (which doesn't download anything) to check the file type, and then, if it's actually a PDF, downloads the file, but that site is blocking the download if the HEAD request happened within the last five seconds or so. A HEAD request isn't an actual download, so there's no reason for it to do that.

%  curl -I 'http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.1882&rep=rep1&type=pdf' ; \
curl -v 'http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.1882&rep=rep1&type=pdf'
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Disposition: attachment; filename="10.1.1.17.1882.pdf"
Content-Type: application/pdf
Content-Length: 192847
Date: Tue, 06 Mar 2012 19:16:25 GMT

* About to connect() to citeseerx.ist.psu.edu port 80 (#0)
*   Trying 130.203.133.150... connected
* Connected to citeseerx.ist.psu.edu (130.203.133.150) port 80 (#0)
> GET /viewdoc/download?doi=10.1.1.17.1882&rep=rep1&type=pdf HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: citeseerx.ist.psu.edu
> Accept: */*
> 
< HTTP/1.1 302 Moved Temporarily
< Server: Apache-Coyote/1.1
< Set-Cookie: JSESSIONID=341C676DB2F0E9A3C9C72007DB270445; Path=/
< Location: http://citeseerx.ist.psu.edu/messages/downloadsexceeded.html
< Content-Length: 0
< Date: Tue, 06 Mar 2012 19:16:25 GMT
< 
* Connection #0 to host citeseerx.ist.psu.edu left intact
* Closing connection #0

dstillman · March 6, 2012

You can observe this if you right-click on the pdf link in google scholar and select Zotero---> save link to Zotero:
That won't work, though it will with typical pdf links.

I think Google only does that for regular search results, and that's what we added a workaround for. (You can see the actual URL by clicking and holding on PDF links. On Google Scholar, at least for me, it's the real URL.)

This isn't working just because of the above issue.

adamsmith · March 6, 2012

thanks Dan - sorry for the confusion - indeed PDFs are downladed from googlescholar otherwise - see e.g. the first link here:
http://scholar.google.com/scholar?hl=en&q=%22Dancing+to+the+tune+of+chemokines%22&btnG=Search&as_sdt=0%2C6&as_ylo=&as_vis=0

(note that Zotero only download the PDF when it's the main link - not the pdf links on the side - since those often are to older WP versions of journal articles I think that's the right thing to do).

adamgolding · March 6, 2012

Hmm, well I am still apparently getting fewer PDFs overall from large queries than I was in the past--when I was using standalone, I would find it got slow sometimes and I would restart it--now I realize that the little arrows are appearing gradually and that zotero is still grabbing information even though it appears to be 'done'-is there a way to monitor this process? More importantly, is there a way to resume it if it is interrupted, say by restartin zotero?

dstillman · March 6, 2012

There's no progress meter for downloads currently, no.

adamgolding · March 6, 2012

right, but is there some subtle way I can tell if downloads are complete? and how can they be resumed?

dstillman · March 6, 2012

No. Don't restart Standalone/Firefox while they're downloading.

And they can't be resumed, though you can of course add them later as you would add any PDF (by dragging the (direct) PDF link onto the parent or dragging the PDF favicon if viewing with a plugin or dragging a file from the filesystem).