Available for beta testing: improved PDF retrieval with Unpaywall integration
The latest Zotero beta includes new functionality to help you find PDFs for items in your Zotero library.
While Zotero has always been able to save PDFs automatically as you save items from the web, that's only been possible when saving from the browser, and generally only when the PDF is available and accessible on the page you're saving from.
In the latest beta, if you save an item from a page where Zotero can't find or access a PDF, Zotero can now automatically search for an open-access PDF using data from Unpaywall and attach the PDF to your item. It can do the same when you create an item with "Add Item by Identifier", and a new "Find Available PDF" option in the item context menu lets you retrieve PDFs for existing items in your library. We run our own lookup service for these searches with no logging of the contents of requests.
When you use "Add Item by Identifier" or "Find Available PDF", Zotero will also load the page associated with the item's DOI or URL and try to find a PDF to download from there before looking for OA copies. This will work if you have direct or VPN-based institutional access to the PDF. (For web-based proxies, only open-access PDFs will be automatically retrieved using this new functionality. You can of course continue to save items with gated PDFs from the browser using the Zotero Connector.) Zotero won't currently check the DOI or URL page when saving from the browser, since loading it would result in additional requests and data leakage (to at least the DOI resolver) for many items that you save, and it would only be useful if 1) you weren't already on that page, 2) it wasn't already in Unpaywall, and 3) the PDF was OA or you had direct access.
For best results, you should try this out with the beta version of the Zotero Connector for Firefox available from the same page. (It will still basically work when saving from the current Zotero Connector for Chrome or Safari, but the save popup may not fully reflect what's happening.)
If there are other sources of PDFs you'd like Zotero to use, you can also set up custom PDF resolvers.
While Zotero has always been able to save PDFs automatically as you save items from the web, that's only been possible when saving from the browser, and generally only when the PDF is available and accessible on the page you're saving from.
In the latest beta, if you save an item from a page where Zotero can't find or access a PDF, Zotero can now automatically search for an open-access PDF using data from Unpaywall and attach the PDF to your item. It can do the same when you create an item with "Add Item by Identifier", and a new "Find Available PDF" option in the item context menu lets you retrieve PDFs for existing items in your library. We run our own lookup service for these searches with no logging of the contents of requests.
When you use "Add Item by Identifier" or "Find Available PDF", Zotero will also load the page associated with the item's DOI or URL and try to find a PDF to download from there before looking for OA copies. This will work if you have direct or VPN-based institutional access to the PDF. (For web-based proxies, only open-access PDFs will be automatically retrieved using this new functionality. You can of course continue to save items with gated PDFs from the browser using the Zotero Connector.) Zotero won't currently check the DOI or URL page when saving from the browser, since loading it would result in additional requests and data leakage (to at least the DOI resolver) for many items that you save, and it would only be useful if 1) you weren't already on that page, 2) it wasn't already in Unpaywall, and 3) the PDF was OA or you had direct access.
For best results, you should try this out with the beta version of the Zotero Connector for Firefox available from the same page. (It will still basically work when saving from the current Zotero Connector for Chrome or Safari, but the save popup may not fully reflect what's happening.)
If there are other sources of PDFs you'd like Zotero to use, you can also set up custom PDF resolvers.
This discussion has been closed.
I assume this would also work without an existing DOI, I.e based on author, title, year only?
Also, if the request results in a pdf, which has a doi, is there a way this could be added to the metadata? (In the same way as it would be if you had dropped the pdf into zotero.)
I'm not sure if this is a universal preference (I'd suspect it would be).
So, in short, having some kind of warning that this is an unofficial copy would be relevant. Better than nothing, but possibly confusing. Unaware users-- students for example-- might not know to watch out for this situation.
Having this as optional, either as a Zotero preference, or as a case-by-case "do you want to save this author draft version?" dialog box, could be a useful part of this feature.
I suspect that usage of this feature will correlate some with fields of study, for example more in fields where arxiv.org is a normal place to get current papers, and less in fields where citing (published) page numbers is crucial.
@adamsmith: Unpaywall sources are ordered 'publishedVersion', 'acceptedVersion', 'submittedVersion', and we try them in that order. We're planning to show the version in the UI somewhere — we had been thinking in a new field in the right-hand pane, but maybe it'd be better to just name the attachment item something like "Full-Text PDF", "Accepted Version (PDF)", and "Submitted Version (PDF)". (Right now it names the title based on the parent metadata, the same as the filename, but that's sort of pointless.)
If so, could there be a setting to try this first, then go to unpaywall (or is that already the case)?
And I think Bjoern's question about URLs was also referring to this part, which presumably would/could work with just a URL.
I did not know how get the botton of "Add Item by Identifier" in Zotero.
1. look up "extensions.zotero.findPDFs.resolvers"
2.eidt the file
3.post the JSON code in "extensions.zotero.findPDFs.resolvers"
Could you maybe take a step back and explain what you're trying to do _specifically_? Why do you want to add a PDF resolver and what site do you want to add?
The rename dialog for attachments could be improved, I think, by having separate fields for item title and filename, with the checkbox making the two fields the same.
@gutzhr: You can use Help → Debug Output Logging → View Output to see what it's doing. A couple things to note:
1) For both Unpaywall and custom resolvers, Zotero forces the PDF and page URLs to HTTPS, even if the URLs returned from the resolver are HTTP.
2) If the PDF download results in an HTML page (such as a login page or captcha), it will currently fail. I was planning to add an option for custom resolvers to pop up a browser window that lets you log in as necessary when that happens, but I'd like to find a way to avoid showing the Mozilla open/save dialog for the PDF, as would happen now if we did that.
@bwiernik: Yes, for sure. I just meant it was pointless having the attachment title also be the filename. (For translators we already show "Full Text PDF" or similar in most cases while renaming the file. This would just be doing the same thing for retrieved PDFs.) Issue created.
The messages during the PDF retrieval are in English only, also I run Zotero in German. Can these strings be localized as well? I am happy to translate any missing strings when they are in transifex.
- Searching for available PDFs...
- Checking 1 item
- Checking X items
- No PDF found
- 1 PDF added
- X PDFs added
Moreover, when selecting multiple items I see the wrong strings "Checking (null) items" and "(null) PDFs added".
a) don't already have a PDF and
b) have a URL or DOI
UPDATE - I wasn't running the beta. Now I do see it