Save metadata to PDF files

I think it would be really useful to have an option to save the author / title / keywords information directly to the PDF metadata. For some reason, most scientific papers don't use that information properly and it's not uncommon to find the DOI as the title of the document.

The same way most music organizers rely on metadata to sort and search files, I think that adding the option to correct metadata on pdf files would make pdf managing much easier.

However, I would keep that feature optional, as many people wouldn't like to have their pdf files altered in any way.
  • apart from the work of implementing this with relatively limited pay-off: I don't believe there is much of a standard for article metadata in XMP, is there?
  • Three of the default fields in PDF metadata are Title, Author and Keywords, which are also a common feature of scientific articles. These can be easily saved into the file (unless the pdf is secured). At the moment I do that (making sure the author and title are correct) with Adobe Acrobat before saving each file in zotero, but it is time consuming and could be easily automated.

    Right now there's an option to rename the pdf file according to the author and title. Why not a similar option to write the author and title metadata into the PDF file?
  • right, but the question for implementing something is not "why not?" but "why?". There is no data exchange format for music, so it makes sense to tag the music files. For bib data we have formats like bibtex or RIS that—while frustratingly imperfect—do a much better job transferring data from one software to the other than the rudimentary xmp tags you suggest.
    So I just don't see the point of implementing this. What am I missing? Why do you go through the trouble of doing this manually?
  • edited February 14, 2014
    I personally combine my Zotero bibliography (stored in a local webdav server) with a desktop search tool called Recoll. It is very useful if you want to perform searches inside the content of your articles.

    The problem is that Recoll depends, to a certain extent, on the metadata stored inside PDF files, otherwise the results are confusing e.g. showing the DOI number instead of the article's title.

    I am aware that zotero's database is just for Zotero and it's not designed to be used by any other software, but it works pretty well nevertheless. I think it's one of the benefits of using free software, that you have full control of what happens to your data.

    And I also think that this shouldn't be hard to implement. I might do it myself if I don't find any "official" tool soon. At the end, it's just a matter of being able to organize my data properly.

    So the answer to "why?" would be: To provide better interoperability between tools that manage or use PDF documents.
  • @wohox If you were to implement this yourself, it would likely be welcome by the community as a plug-in.
  • edited February 14, 2014
    I am aware that zotero's database is just for Zotero and it's not designed to be used by any other software
    that's not quite true, of course. There are several tools that read Zotero's database directly, like Qnotero and ZotQuery.

    As you say, it's free software so by all means see if you want to implement it—I'd guess Dan would accept a patch if it's well done— but between Qnotero for quick access and Zotero's built in full-text search I don't really see much of a need.
  • I have another use for this: I use an ebook reader (Kobo Aura HD) for offline reading, and there is no way to move metadata between Zotero and the Kobo (tried Calibre but found no path).
    The Kobo is just a plain disaster for searching files: it only relies on its proprietary library and PDF metas. It cannot search or browse filesystem names, otherwise my issue would be solved by Zotero / Zotfile's local file renaming scheme. Indeed, the metadata in original PDFs are generally gibberish. So it would be useful to be able to batch rewrite just those 3 tags to something useful.
    Best,

    Jan
  • edited December 2, 2015
    There are some free metadata editors out there, that could be the basis for a plugin:

    https://github.com/zaro/pdf-metadata-editor
    https://code.google.com/p/pdf-meta/
    http://sourceforge.net/projects/exiftool/

    JabRef does have XMP-metadata support btw:
    http://jabref.sourceforge.net/help/de/XMPHelp.php
    https://github.com/JabRef/jabref

    Why? Reading on a mobile device (!!!), sharing of pdfs, re-use of pdfs with other tools with different functionality (e.g. visualisation, personal data mining) - generally I use a multitude of tools on the same base for different tasks (vive free and open software).

    Functionality? Basic Author, Title, Keywords would be enough, but adding and removing (part of the) metadata or batch edit the metadata would be a bonus, so I can for example add a copyright notice, use a variety of programs, whith the same file-base etc...

    Thoughts: This - strangely - is a feature that has often been requested by users from a variety of biblio-software, but has not been implemented... Citavi has this task on it's to-do list since 2012 (Task# 6518)... corr: seems citavi5 has it as a test feature in their beta without batch-options...

    Bias: I really would love it, as it would complement my workflow - that's all I can really say :)
  • I also notice that there is a "define PDF metadata" action in MacOS's Automator. I never did any scripting, but would it be difficult to receive 3 basic fields (Title, Author, Keywords) from Zotero and write them to the file?
    Batch processing would be really useful, so the service should be able to handle several references at once, but even if it's one-at-a-time it would already be much better than copy-paste...
  • Someone did somethings like that to
    Linking DevonThink with BibDesk https://www.youtube.com/watch?v=Dso3z0M6z7I

    I believe this is the script: http://www.organognosi.com/how-to-connect-a-pdf-file-inside-devonthink-with-its-record-in-bibdesk/#codesyntax_1

    I myself use devonthink + zotero + zotefile. Having the metadata in the pdf would be very nice.
  • edited February 29, 2016
    This https://code.google.com/archive/p/pdf-meta has a command line version to edit pdf metadata. Should not be hard then to create a plugin but apparently zotero want to integrate that: https://github.com/jlegewie/zotfile/issues/137
  • Still interested in this capability... have there been any developments or decision?
  • edited 8 days ago
    Just keeping the thread alive...
    I'm currently using http://broken-by.me/pdf-metadata-editor/ (featured in Gego's list above), to edit one-at-a-time, copy-pasting the basic fields from Zotero.
    It's a hassle, but it works so my Kobo reader can retrieve the needed documents.
Sign In or Register to comment.