Avoid attach pdf from arxiv/ prioritize other sources

edited April 17, 2023
Sometimes*, when adding a new item by identifier with its publication doi, and being its pdf open to download, instead zotero downloads automatically the arxiv version pdf.

For example, it just happened with 10.1029/2021MS002575.

Is there any way to prioritize the published version or to avoid arxiv pdf download if not explicitely asked for an arxiv preprint?


* When dealing with many items, this is a big quantity of papers, and it rises a big job of removing each of these pdf preprints, downloading the published versions (open or through sci-hub), attaching them to zotero and removing the temporary downloads.
  • edited April 17, 2023
    Zotero always prioritizes the published version as long as you have access to it, and that PDF is downloaded properly for me.

    If it's not working somewhere you, we'd want to see a Debug ID from Zotero for a save attempt after reloading the page.
  • Oh, this is via Add Item by Identifier? That may not work. Let me check.
  • The debug ID: D2066139293
  • edited April 17, 2023
    OK, yeah, when using Add Item by Identifier, Zotero won't currently go and load the publication page and try to download the PDF from there — it will just look for an OA PDF, and our data source, Unpaywall, currently only has the submitted version.

    If you delete the PDF item and use Find Available PDF, it will follow the DOI and download the OA published version.

    I'm actually not sure if we intended that. We'll need to think about whether Add Item by Identifier should use the same process as Find Available PDF.
  • edited April 17, 2023
    "I'm actually not sure if we intended that. We'll need to think about whether Add Item by Identifier should use the same process as Find Available PDF."

    It would be great prioritizing this way over Unpaywall or giving the user the option to choose the preferent method.
  • So I think there are two reasons we didn't support this:

    1) It would result in a load of the publisher page, potentially a number of subsequent requests, and potentially a first-party PDF download for every identifier pasted into Add Item by Identifier, even if there were dozens or hundreds of identifiers. Add Item by Identifier currently mostly uses APIs that are designed for automated access, but publisher pages aren't, and depending on the speed of access, it could result in people being blocked.

    We do support this for Find Available PDF, but given the risk of blocking, Find Available PDF runs more slowly and tries to space out requests to given domains. If we supported this for Add Item by Identifier, file downloading for multiple items would need to operate in the same slower way.

    2) As with Find Available PDF, this would only help for people with direct (on-campus or VPN-based) access to PDFs on publisher pages — not access via web-based proxies that the Zotero Connector can use — or for OA PDFs. And the latter of course should be in Unpaywall. Your OA example is only a problem because Unpaywall doesn't yet have a download link for the published version. (I agree that this example is confusing and counterintuitive, though.)
Sign In or Register to comment.