Idea for how to import PDF metadata

mcswell · August 7, 2009

There have been threads in the back where people want to know how to import the metadata from PDFs they've already collected, into Z. The usual answer is that there is no automatic way, although Z can import metadata at JSTOR and Google Scholar for PDFs there.

There is however a way that this might be (almost) automated. Most PDFs can be dumped to text. Often the first few lines of text are the title and author. If the article is on Google Scholar, a search using the first few lines of text will often find the paper. (I imagine this would work with JSTOR too, but my university doesn't have an account there.)

This is a a bit of a nuisance to do--dump to text, search at Google Scholar, etc. However, I suspect it could be automated; one could create a small applet into which you would drag PDFs from your computer. The applet would extract a few lines of text, do the search, and offer the hits. Click on the appropriate hit (often the first one), and the applet would add the metadata to Z. If the right hit doesn't appear, one could use manual methods (like copy-paste).

I can imagine the applet being given a directory on the local hard drive, and having it sequentially process all the PDFs, with the user deciding in each case whether to accept the hit, or put that PDF aside and try some other method, maybe later.

adamsmith · August 7, 2009

which Zotero are you using?
Retrieve Metadata from .pdf is doing pretty much exactly that. It's been part of Zotero since 1.5b

mcswell · August 7, 2009

Thanks, I just discovered that! I had been using 1.0-something, or was it 1.1; anyway, it didn't have this feature. I just now upgraded to 2.0, and voila!