Issues with PDF Retrieval

I'm trying to import about 300 papers from two Google Scholar searches I've made; all the papers I want are already in my Google Scholar library, and so I can get them onto Zotero easily by exporting my library, but then there are no PDFs or snapshots attached.

When I select everything in Zotero and try to "Find Available PDFs," it tells me it can't find anything for all 300 articles. Trying to use the Zotero connector to import them page-by-page from my Google Scholar library (or from the search) gets me rate-limited pretty quickly.

Any tips? Or is my best option really to manually look everything up by title and find PDF versions?
  • dstillman Zotero Team
    edited March 2, 2023
    For items with a DOI, Find Available PDFs will download open-access PDFs, and for items with a DOI or URL, it will download PDFs you have IP-based access to (on-campus or using a VPN). It won't download gated files if you need to use a web-based proxy to access them.

    There's no particular reason to expect to be able to download hundreds of files at once without being blocked by something, even if you're on-campus. Publishers tend not to like that kind of thing.
  • I understand that Zotero only has access to non-gated files, but it just seems unlikely that there's no open-access PDF for any of the 300 papers I have (unless one getting blocked causes the whole batch to get blocked for some reason).
  • I think the problem is likely going to be the lack of DOI (which typically aren't in the google scholar metadata), which means that the UnPaywall lookup for openAccess papers fails.
    Use the DOI manager add-on to get DOIs for all/most of the items, then try again. https://github.com/bwiernik/zotero-shortdoi
  • Has anyone found a solution to this in the past few years? I similarly need to do a bulk download of many hundreds of paper for a study in order to facilitate mass data extraction. I have access to my library via VPN/login and setup the Resolver to my University's system. If I do this in batches, I still run into the issue of being blocked as a webcrawling "bot" by the sources after running the first batch.
  • There is no solution -- automatically downloading 100s of PDFs is exactly the type of behavior publishers are trying to fend off, so your best bet is just to go fairly slowly (how quickly bot detection gets triggered depends both on the publisher and on the particular mix of publishers you're querying)
Sign In or Register to comment.