PDF Metadata Lookup: Try to find out more information from filename

For quite some time I used a handwritten collection of scripts to maintain my papers and ebooks in pdf format. Now I want to switch to Zotero. I engineered my scripts so that they store a lot of metadata information in the filename, among other things either doi or isbn corresponding to the pdf.

That means it should be not too complicated to get them into Zotero if there would be a step in the PDF Metadata Lookup to check the filename for senseful meaning, i.e. throw a regular expression on it for doi or isbn and if this yields a result take this to get the metadata.

I think this feature would be convenient for others as well because one would be able to automatically (partially) (re)build a database by just having files with isbn/doi in the filename. Maybe one could write it with support for userdefined regular expressions and actions corresponding to them. But that won't be necessary for my use case.

Thank you very much for your work so far!
  • We could do that, though I think DOI/ISBN in the filename will be pretty rare. Title and author would certainly be more common, but there'd have to be some sort of check to attempt to determine if there's actually useful info, since otherwise short/generic filenames could result in false matches from Google Scholar.
  • Yes, this is true. Maybe one could make it configurable. And if one would be extend the »Rename PDF according to metadata« function to include doi/isbn, files with doi/isbn data wouldn't be that rare and one would get a further partial backup of the database just by having the files with useful, machine useable filenames.
  • No, this needs to just work. If someone wants to submit a patch to Retrieve Metadata that incorporates filenames in various ways without increasing false positives, we'd likely accept it, but I don't think we'd make this configurable.
  • To date I have saved all my PDF journal articles with journal name, volume, number (sometimes), page numbers, and year. I would love to be able to use this as metadata and pull all of my PDFs into Zotero. The lack of or insufficient embedded metadata in many if not most of my PDF articles currently limits my ability to import them efficiently.

    Any way to extract this information and the potentially look it up via online sources to verify and complete the Zotero citation details would be very much appreciated.
  • I've added an issue for this.
Sign In or Register to comment.