help: batch import files while parsing filename for metadata

zoon · May 26, 2010

I have ~2000 local pdf files that I want added as items in a Zotero collection. The file names consistently contain metadata to be imported (name, year, title, doi, journal and so on) and most files have a standalone txt file (same filename apart from the extension) with abstract and other details.

Any suggestions on how to best move forward?

My hunch is that I have to write a script that loops over all the files, extracts metadata from filenames and generates some type of database file that Zotero then can import. I'd appreciate informed suggestion on what format to use and relevants links.

Or are there already tools out that for doing just that? To make a comparison what I'm after would be similar to what tools like http://www.mp3tag.de/en/ does for importing/exporting to/from mp3 filename/file metadata.

kieren · May 26, 2010

It's not going to be too hard. Documentation is here: http://www.zotero.org/support/dev/interacting_with_zotero_from_within_firefox and here: http://www.zotero.org/support/dev/api_user_docs . If you end up doing this, please consider contributing your solution or just your code to the wiki so that the API documentation improves.

What I'd probably do is get the file metadata into JSON format using perl and then write some javascript probably using the POW extension to make things a bit easier for me (firefox 3.5 only at the moment though :-[ ) to get the JSON + file attachment into zotero.

The alternative would be to write RIS with an LI entry giving the file:// to the attachment, but to be honest I think using the API will be easier if you're comfortable with programming.