Available for beta testing: improved PDF retrieval with Unpaywall integration
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
This discussion has been closed.
1) Are you running it from the same network each time?
2) Are you actually selecting 1000 records, right-clicking, and selecting "Find Available PDFs", or are you trying in smaller batches?
Recall that, when you use "Add Item by Identifier" or "Find Available PDF", Zotero will actually load the DOI/URL page before checking for OA sources. The latter can change over time as Unpaywall updates its data and we incorporate it (which happens no more than once a week), but the former can change based on whether you have access to a PDF from your current network and whether a site is blocking you. If you try to find PDFs for 1000 items at a time, there's a decent chance some sites will start blocking you for making automated downloads.
We should probably put in some automatic per-site rate-limiting to keep those requests under control no matter how many items you select.
@y.sapolovych: It looks like you have some items with DOIs that resolve directly to PDFs as well as direct PDF URLs in the URL field (which is generally incorrect, since that's not normally the URL you would cite). In the latest beta, those should be handled properly. Thanks for reporting.
https://harzing.com/resources/publish-or-perish/tutorial/using-pop/queries-preferences
https://harzing.com/resources/publish-or-perish/tutorial/google-scholar/slow-searches
To achieve the required reduction in requests, Publish or Perish delays subsequent requests for a variable amount of time (up to 1 minute). The higher the recent request rate, the longer the delays."
And it really does work quite well - especially in combination with the ability to specify the year, this allows us to download 1000 hits per year for any give search query. Unfortunately without also downloading the actual pdfs. The program is primarily intended for bibliometric analyis; but the developer has shown some interest in also including a way to download the URLs. We currently export the bibliographic information in *.bib format from PoP and then import them into Zotero. But so it the fix that Dan just mentions to Yevhen's issue works, that will make us EXTREMELY happy (corpus linguistics) campers :)
We can adjust this further if we can identify throttling behaviors of specific sites, but since you can always retry retrieval, and Zotero will skip items that already have files, the main concern here is just not overloading servers and obeying backoff instructions, not making sure it always works on the first try.