Script for extracting annotations in each pdf as a separate text file with same name

promente · October 5, 2014

zotfile's ability to extract annotations is great. but I like working with my linked pdfs on my hard drive too, and I wanted to see which files on my drive already have annotations and to be able to see a quick summary of those annotations without having to open each file and without necessarily using zotero.

So I wrote this little python script https://gist.github.com/stevepowell99/e1e389a57ea9a2bcb988 for extracting annotations in each pdf in a folder as a separate text file with same name as the pdf. Each new text file has the same modification date and the same name so it should get listed together with its big sister in your file browser. This means you can see at a glance if you have already marked up a PDF, and you can see all the notes (together with the approximate page numbers) at a glance. There are some tips on using it here: http://socialdatablog.com/extract-pdf-annotations.html.

This makes most sense now that zotfile 4.0 enables us to store our pdfs in a file structure which mimics our zotero collections (http://zotfile.org).

Hope this is useful for someone.

adamsmith · October 5, 2014

cool. I think people have been asking for something like this for use in QDA software and the like, too.

promente · October 5, 2014

Would be interested in feedback from anyone trying it on Windows or MacOS, as only have a Linux box.