Extract metadata from PDF file itself

Priesemut · August 13, 2012

Dear zotero community,

Is there a way to extract the information on author, subject, date etc. contained in a PDF file into zotero?
This is something different than using the PDF data retrieval offered by zotero.
If you have PDF files for which you have entered information on author, subject, date, keywords etc. locally to the file by yourself, they are not stored on Google scholar or somewhere else in the web.
There is already a post without any replies on this issue in this forum and I've found another one on the web. So this seems to be something many people are looing for.
Thanks in advance for your help.

Kind regards

dstillman · August 13, 2012

If there's a relevant existing thread, you should post there.

But there are actually many threads on this. Search for "XMP".

Priesemut · August 13, 2012

Sorry, I didn't have the idea to search for "XMP".
Well, the thread from 2010 says:

As in past threads: pdfinfo (which Zotero already uses) can, indeed, read this information. In principle, it is possible. Someone just has to write the code to do this.

Is there someone working on the implementation of this feature?

Simon · August 13, 2012

I have an implementation of this using pdf.js in my pdfjs branch. However, that branch won't be merged to master until pdf.js is capable of extracting text from most PDFs.