What are the PDF metadata that Zotero is searching?

brandelune · July 2, 2025

When a PDF is put into the library, Zotero tries to find its metadata.

I have three questions:
- what are the metadata made of?
- where are they stored (and where is Zotero looking for them)?
- how is it possible to add them to a PDF that one wants to pubish?

poettli · July 2, 2025

https://www.zotero.org/support/retrieve_pdf_metadata

brandelune · July 2, 2025

Thank you for the pointer.

The relevant part was:

"The Retrieve Metadata feature uses a Zotero web service to find item metadata. The Zotero client sends the first few pages of text from the PDF to the web service, which uses a variety of extraction algorithms and known metadata from Crossref, paired with DOI and ISBN lookups, to build a parent item for the PDF. The Zotero lookup service doesn’t require a Zotero account and doesn’t log any data about the content or results of searches."

And seemingly, a correct answer would be:

- the metadata is made of information that is registered online by the publisher

- it is either stored at a "registration agency" (RA) and is identified by a DOI, or sometimes stored by the publisher and is identified by an ISBN

- Zotero uses a variety of extraction algorithms and known metadata from Crossref, paired with DOI and ISBN lookups to find it

- to register information at a RA, you need to be a member of one
check this page: https://www.doi.org/the-community/existing-registration-agencies

- and you need to submit content about your publication through their process
for ex., see the registration process for Crossref: https://www.crossref.org/services/content-registration/

Is that correct?

adamsmith · July 3, 2025

Mostly.

ISBN data comes from libraries, not publishers. And you don't have to be a member of an RA to register a DOI. You can get DOIs for documents by placing them in free repositories like Zenodo or Figshare.

brandelune · July 3, 2025

Thank you very much for the corrections.

Now we have an answered FAQ I guess :-)