Save metadata to PDF files

I think it would be really useful to have an option to save the author / title / keywords information directly to the PDF metadata. For some reason, most scientific papers don't use that information properly and it's not uncommon to find the DOI as the title of the document.

The same way most music organizers rely on metadata to sort and search files, I think that adding the option to correct metadata on pdf files would make pdf managing much easier.

However, I would keep that feature optional, as many people wouldn't like to have their pdf files altered in any way.
  • apart from the work of implementing this with relatively limited pay-off: I don't believe there is much of a standard for article metadata in XMP, is there?
  • Three of the default fields in PDF metadata are Title, Author and Keywords, which are also a common feature of scientific articles. These can be easily saved into the file (unless the pdf is secured). At the moment I do that (making sure the author and title are correct) with Adobe Acrobat before saving each file in zotero, but it is time consuming and could be easily automated.

    Right now there's an option to rename the pdf file according to the author and title. Why not a similar option to write the author and title metadata into the PDF file?
  • right, but the question for implementing something is not "why not?" but "why?". There is no data exchange format for music, so it makes sense to tag the music files. For bib data we have formats like bibtex or RIS that—while frustratingly imperfect—do a much better job transferring data from one software to the other than the rudimentary xmp tags you suggest.
    So I just don't see the point of implementing this. What am I missing? Why do you go through the trouble of doing this manually?
  • edited February 14, 2014
    I personally combine my Zotero bibliography (stored in a local webdav server) with a desktop search tool called Recoll. It is very useful if you want to perform searches inside the content of your articles.

    The problem is that Recoll depends, to a certain extent, on the metadata stored inside PDF files, otherwise the results are confusing e.g. showing the DOI number instead of the article's title.

    I am aware that zotero's database is just for Zotero and it's not designed to be used by any other software, but it works pretty well nevertheless. I think it's one of the benefits of using free software, that you have full control of what happens to your data.

    And I also think that this shouldn't be hard to implement. I might do it myself if I don't find any "official" tool soon. At the end, it's just a matter of being able to organize my data properly.

    So the answer to "why?" would be: To provide better interoperability between tools that manage or use PDF documents.
  • @wohox If you were to implement this yourself, it would likely be welcome by the community as a plug-in.
  • edited February 14, 2014
    I am aware that zotero's database is just for Zotero and it's not designed to be used by any other software
    that's not quite true, of course. There are several tools that read Zotero's database directly, like Qnotero and ZotQuery.

    As you say, it's free software so by all means see if you want to implement it—I'd guess Dan would accept a patch if it's well done— but between Qnotero for quick access and Zotero's built in full-text search I don't really see much of a need.
  • I have another use for this: I use an ebook reader (Kobo Aura HD) for offline reading, and there is no way to move metadata between Zotero and the Kobo (tried Calibre but found no path).
    The Kobo is just a plain disaster for searching files: it only relies on its proprietary library and PDF metas. It cannot search or browse filesystem names, otherwise my issue would be solved by Zotero / Zotfile's local file renaming scheme. Indeed, the metadata in original PDFs are generally gibberish. So it would be useful to be able to batch rewrite just those 3 tags to something useful.

  • edited December 2, 2015
    There are some free metadata editors out there, that could be the basis for a plugin:

    JabRef does have XMP-metadata support btw:

    Why? Reading on a mobile device (!!!), sharing of pdfs, re-use of pdfs with other tools with different functionality (e.g. visualisation, personal data mining) - generally I use a multitude of tools on the same base for different tasks (vive free and open software).

    Functionality? Basic Author, Title, Keywords would be enough, but adding and removing (part of the) metadata or batch edit the metadata would be a bonus, so I can for example add a copyright notice, use a variety of programs, whith the same file-base etc...

    Thoughts: This - strangely - is a feature that has often been requested by users from a variety of biblio-software, but has not been implemented... Citavi has this task on it's to-do list since 2012 (Task# 6518)... corr: seems citavi5 has it as a test feature in their beta without batch-options...

    Bias: I really would love it, as it would complement my workflow - that's all I can really say :)
  • I also notice that there is a "define PDF metadata" action in MacOS's Automator. I never did any scripting, but would it be difficult to receive 3 basic fields (Title, Author, Keywords) from Zotero and write them to the file?
    Batch processing would be really useful, so the service should be able to handle several references at once, but even if it's one-at-a-time it would already be much better than copy-paste...
  • Someone did somethings like that to
    Linking DevonThink with BibDesk

    I believe this is the script:

    I myself use devonthink + zotero + zotefile. Having the metadata in the pdf would be very nice.
  • edited February 29, 2016
    This has a command line version to edit pdf metadata. Should not be hard then to create a plugin but apparently zotero want to integrate that:
  • Still interested in this capability... have there been any developments or decision?
  • edited January 9, 2019
    Just keeping the thread alive...
    I'm currently using (featured in Gego's list above), to edit one-at-a-time, copy-pasting the basic fields from Zotero.
    It's a hassle, but it works so my Kobo reader can retrieve the needed documents.
  • Thanks everyone for the abundance of links. This is exactly what I've been looking for. I like the work around from Matthias Nott to link DevonThink with BibDesk. Does anyone have news on if (1) this is getting worked on for inclusion on Zotero or if (2) anyone has created a plugin/work around?

    Also, I tried thinking of ways to create something using automator on mac (though I am new to automator) but hit a wall when looking how to get metadata info from a entry within Zotero. Any ideas?

  • We spend a lot of our time, double checking the information from publishers and fixing missing and wrong data, yes, that occurs more often than many think. It should be nice to be able to make that information available to different softwares as external readers ND indexers. I would love this feature as a plug in or built-in
  • I have written a small Python script that reads the filename of your PDFs and sets the title as Title in the PDF metadata, if your files are formatted in the Zotfile format 'author - year - title.pdf'
    All you need to do is run the script with the path to your Zotero library as argument. Please make a backup beforehand as this will overwrite your PDFs.
  • Hello,

    I also made a standalone script that :
    1. searches for an item in your local zotero database (search by DOI or author, year)
    2. update metadata of a pdf of your choice.

    I'm not a regular SQLite user, so the trickiest was to figure out how to read the database while Zotero is using it (because it's locked). :
  • I also think that a tool to fix metadata would be greatly useful (either in Zotero, or standalone).
    People who store PDFs in systems that make use of it need it, because articles are very often badly annotated. Mentioned example : an e-reader like Kobo; the title that is displayed in the file browser is taken from the metadata, so better have it correct.

    Regarding the statement "database should be used by Zotero only": well no, what is wrong with retrieving data from it in a read-only manner? The problem with Zotero official API is that it appears to query the online DB... For such a task as retrieving a file metadata, it's nonsense to request an internet connection. Also I found it easier to understand the DB schema and run SQL statements than to understand the API.
Sign In or Register to comment.