Importing files with structured filenames: "title @author $year.pdf"

Hello community. I have a large collection of PDFs, some of them which I scanned myself (I'm a historian and we often use old articles). All of them are in the format "title @author $year.pdf" for easy look up through Spotlight/Alfred. Is it possible to take advantage of this naming system and import them as partly-formatted items (with title, year and last name of the author) to Zotero together with the original PDF as attachments? Presumably something similar to what does, but with filenames instead of a folder structure.
  • edited May 4, 2021
    Easiest (or maybe rather fastest) I think would be to have a fairly simple python or whatnot script generate a RIS file from these filenames, and then import that. If I'm not mistaken, the PDF file path should go in an L1 line.
  • yup, that's a good idea, so the end result should look something like

    TY - JOUR
    L1 - ~/Users/myfolder/title authors year.pdf
    AU - author
    TI - title
    PY - year
    ER -
  • Thank you, now I only have to learn to code! :)
  • If you're on a mac, the python script will just work, if you're on windows, I can take a stab at converting it to vbscript.
  • Thank you! I've never executed a Python script – is this the way to do it?
  • It depends a bit on what operating system you use -- if I know that I can give you detailed instructions.
  • (if it's a one-time thing and you're on windows, I can transform the output of a dir /a-d /b /s for you, but that's really for a one-shot)
  • Thank you! macOS Big Sur. But you can also redirect me somewhere, I don’t want to steal your time. I think I can adapt the script, I'm just not sure how to execute it.
  • edited May 6, 2021

    No worries.

    1. Download the script from (changed from before) and save it somewhere that's easy for you to remember. Your Downloads folder should be fine for now.
    2. The script will have to be ran from a terminal command line. I find it easiest to get to a command line at a particular place (which will matter later) by adding it to the finder services menu: (the bit below "Adding a Terminal Shortcut to the Services Menu")

    Once that is in place, let's say that your PDFs live at Documents/My Academic stuff/very important and Documents/My Academic stuff/frivolous:

    1. Go to Documents/My Academic stuff with finder and use "New terminal at folder". The terminal will pop up. Without the finder services you can also cmd-space, type "terminal", and then type cd '~/Documents/My Academic stuff', which will achieve the same.
    2. Type python ~/Downloads/ 'very important' frivolous
    3. This will create Documents/My Academic stuff/very important/very important.ris and Documents/My Academic stuff/frivolous/frivolous.ris which you can import.

    You can add paths to folders of PDFs as you please, or run it one folder at a time. The outcome will be the same.

  • Note that unlike my folder importer linked to earlier, this will not cause Zotero to do metadata lookup. What's in the RIS is what you get.
  • edited May 6, 2021
    Thank you, it works! Except that... actually I have "®" not "$" in my filenames. So I replaced it accordingly in your regex and I get this:

    SyntaxError: Non-ASCII character '\xc2' in file /Users/jakub/Downloads/ on line 11, but no encoding declared; see for details

    How does one declare encoding? If it's complicated, I can batch-change all ® to $.
  • Could you paste your line 11 here? I suspect you missed something in the code

    (you can define utf-8 encoding by putting
    # -*- coding: utf-8 -*-
    # coding: utf-8

    in the 2nd line of the script, but I don't think that should be necessary)
  • m = re.match(r'^(.+?)@(.+?)\®(.+?).pdf$', os.path.basename(pdf))
  • m = re.match(r'^(.+?)@(.+?)®(.+?).pdf$', os.path.basename(pdf))

    (the \ before the $ sign is an "escape" character, because $ is a special character. It's not needed for ®. If it doesn't run after this, add the # coding: utf-8 line.
  • I got the same error, but I added the encoding in the second line and it works. Thank you both! Also for helping me to break the psychological barrier “oh no, coding is hard to start”. Now I want to experiment more. :)
  • edited May 6, 2021
Sign In or Register to comment.