Unable to retrieve metadata for PDFs
This discussion was created from comments split from: "Retrieve metadata for PDF" fails with "an unexpected error occurred".
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
Which Zotero version, what document?
Retrieve metadata has completely changed since this was reported last year, so it's definitely not the same error under the hood.
It happens for multiple documents. Next time I find an example I'll update her.
https://www.zotero.org/support/debug_output
D2057122540
Although the Zotero connector works fine, and Retrieve metatadata worked fine until recent updates.
Is there a list of all the domain names accessed by Zotero?
Zotero's own infrastructure is hosted on AWS. While a DNS-based access restriction could whitelist *.zotero.org, those IPs can change every minute, and any restriction that didn't take that into account would result in things breaking regularly.
Other functions in Zotero require access to any site you save in the browser (e.g., to save files) and to various other services that can change at any time (e.g., to retrieve metadata of various kinds).
Do you have any info or documentation stating exactly what is required for metadata access to PDFs to work? Is there a way to do this locally? Did this change when pdfxchange was removed?
I need information to submit to the network admins on how the software behaves and what it is required to access for this feature.
For your other questions:
There's no way to do this locally, no. Zotero needs to query online databases -- both its own and others such as CrossRef/DataCite or Worldcat -- to be able to match papers to metadata. The metadata isn't in the paper itself; it's online.
dstillman's answer from April 10 really is the best we can give you on this. In order for Zotero to be able to retrieve metadata it needs to be able to send and receive data over standard http and https ports. It uses multiple and potentially changing domains for that and there's simply no way to reliably list them all.