Unable to retrieve metadata for PDFs

  • same error here
  • Could you say more j.cossio?
    Which Zotero version, what document?

    Retrieve metadata has completely changed since this was reported last year, so it's definitely not the same error under the hood.
  • I am using Zotero 5.0.44.
    It happens for multiple documents. Next time I find an example I'll update her.
  • @adamsmith It basically happens for every PDF. Any clue of what can be going on? These are PDFs of recent articles, with OCR.
  • Could you produce a debug ID for dragging a PDF to Zotero and the failed (automatic) retrieve metadata?
  • I think I just did.
  • edited April 9, 2018
    By the way I am behind a proxy. Could be the source of the problem?

    Although the Zotero connector works fine, and Retrieve metatadata worked fine until recent updates.
  • Proxy could be related, but if you can submit debug IDs, you should be able to retrieve metadata. Let's see what dstillman & adomasven see in the debug.
  • @j.cossio: You're getting a 403 error from your proxy server. As adamsmith says, it's odd that you're able to submit debug output (which is a POST to https://repo.zotero.org), but for some reason your POST requests to https://recognize.zotero.org aren't working, so you'll have to debug that.
  • @dstillman Requests to https://recognize.zotero.org are a feature of newer versions I guess? In earlier versions I could retrieve metadata without problems. Probably my proxy is blocking this domain. I'll check with the network admins.
  • Yes, that's new (though if your network access is based on some sort of whitelist, many things in Zotero are likely to be broken).
  • @dstillman Syncing with the zotero.org library fails. What is the server used in this case?
    Is there a list of all the domain names accessed by Zotero?
  • edited April 10, 2018
    No, there's no such list. Zotero is a web-connected tool, and various things won't work properly without at least the same access as your web browser.

    Zotero's own infrastructure is hosted on AWS. While a DNS-based access restriction could whitelist *.zotero.org, those IPs can change every minute, and any restriction that didn't take that into account would result in things breaking regularly.

    Other functions in Zotero require access to any site you save in the browser (e.g., to save files) and to various other services that can change at any time (e.g., to retrieve metadata of various kinds).
  • @dstillman Is there at least a domain name for syncing the library?
  • api.zotero.org and stream.zotero.org, but again, the associated IPs can change literally every 60 seconds.
  • I am having the same issue, under the same circumstances. My company tightly controls outside access and I just need the PDF metadata retrieval options to work.

    Do you have any info or documentation stating exactly what is required for metadata access to PDFs to work? Is there a way to do this locally? Did this change when pdfxchange was removed?

    I need information to submit to the network admins on how the software behaves and what it is required to access for this feature.
  • This has always been the case and has nothing to do with pdftools (which I think you're referring to; they also weren't removed, they're just automatically bundled now). It's possible this used to work for you previously because you were running the Firefox add-on. If that's the case, you could try to make the (accurate) argument that nothing has really changed. Zotero still has and requires exactly the same sort of access to the internet as a standalone app than it did as a browser extension.

    For your other questions:
    There's no way to do this locally, no. Zotero needs to query online databases -- both its own and others such as CrossRef/DataCite or Worldcat -- to be able to match papers to metadata. The metadata isn't in the paper itself; it's online.

    dstillman's answer from April 10 really is the best we can give you on this. In order for Zotero to be able to retrieve metadata it needs to be able to send and receive data over standard http and https ports. It uses multiple and potentially changing domains for that and there's simply no way to reliably list them all.
  • Same, sort of, here: laptop at work does not get metadata, but at home or with VPN there's no problem. How to solve this?
  • As Dan says above, this is basically about Zotero having to be able to access the internet. You'll need to talk to your work IT how to do that -- might be a proxy configuration, might be some sort of firewall.
Sign In or Register to comment.