Trouble importing Embedded PDF Metadata

seeingtheforest · February 15, 2020

Hi,

I have a lot of PDF documents that have keywords embedded in the metadata, but they are not being imported as tags into Zotero. Is there anything I can do to make this work?

Alternatively, the tags are also embedded within the filenames - is there a way to search Zotero by a sub-string in the filenames in order to select groups and manually tag them?

A final alternative is that I have all of the metadata in an excel spreadsheet - is there a way to import/associate the data with entries in the Zotero library (based on filename/path)?

Thanks!

dstillman · February 15, 2020

No, Zotero doesn't use metadata embedded in PDFs, since it's often low quality (or was the last time we looked). Are these keywords you added yourself?

If you temporarily disable file renaming from the General pane of the preferences and add the files, you could then write some code, either locally with the JavaScript API or remotely using the web API and a library like pyzotero, to look for keywords in attachment filenames and assign them as tags on the parent item. But assuming you're unfamiliar with Zotero scripting it would take a bit of work.

Pulling them out of a spreadsheet (or, rather, CSV file) would also be possible but probably a bit more complicated to associate each row with its entry in Zotero.

(Once the keywords were extracted, you could then have Zotero rename the files based on the parent metadata if you wanted.)

seeingtheforest · February 17, 2020

Thanks. I'm going to try to do something with Calibre and excel to create a formatted Bibtex file to import to zotero. Some others seemed to have had success with various parts of this process so I should be able to piece something together. I'll report back if I do.