Searching note content outside Zotero

Searching in Zotero will reveal which items contain the search terms, but not where in these items the matches are. This means that I will have to click through each matched item attachment and search again inside the attachment. In contrast, if I search a text document using grep, it will return the line with the match and optionally any number of lines before and after the matching line. This is great for quickly finding relevant matches, and it would be very useful if I could search through Zotero’s notes and pdf-attachments in this way.

The PDF attachments are easy enough, I store them in a dedicated dir and could just run pdftotext on them, just like Zotero does. However, it seems like the notes are only contained within the sqlite database, and not the Zotero storage directory like web snapshots and other files. This means I need to perform a sql query to dump all the note content to files (preferably with the parent item as the file name) and then convert from html to md using pandoc. I am not familiar with the database structure of Zotero (or that much with sql either for that matter), would such as query be straightforward? Do you have any advice on where to start?

  • After posting this I found out that the brilliant Better BibTex extension has an option to include notes when exporting to .bib files. It can also automatically keep these files up to date, which is quite convenient. The only downside is that it export a collection as a single file so tools like grep can’t indicate which item has the matching string simply by displaying the file name, but there is probably some BibText or BibLatex processing tool that could help me with that.

    I am still interested in a reply to my original question, but do you think this second approach would be smoother (and are there any additional solutions that I have overlooked)?

  • Something like this works well for parsing out the relevant fields from the BibLatex files created by Better BibTex:


    #!/bin/env bash
    csplit -kq $1 "/^@/-1" "{*}"
    rm xx00 # An empty file that is created when splitting

    mkdir -p notes
    mkdir -p abstracts
    for split_file in xx*; do
    # Create new file name
    file_name=$(grep "file =" $split_file)
    file_name="${file_name##*/}"
    file_name="${file_name%.*}"
    # Extract and save the relevant sections
    rg "annotation = \{.*?\},\n" $split_file \
    --multiline --multiline-dotall --no-line-number > notes/${file_name}.txt
    rg "abstract = \{.*?\},\n" $split_file \
    --multiline --multiline-dotall --no-line-number > abstracts/${file_name}.txt
    rm $split_file
    done


    Afterwards all files in the created dirs can be searched with grep/rg.
  • If you just want the notes, BBT also has a "Collected notes" exporter.
  • edited February 1, 2020
    Export translators can all only export a single file. It's just how Zotero works. I mean *technically* it would be possible to hand-craft a zip file that contains multiple files inside a translator, but that would be madness.

    I'll probably try it someday.
  • Thanks @emilianoeheyns ! I originally thought I would only use this for notes, but realized it might be useful for grabbing abstracts as well, so it is quite convenient that these are included in the .bib file.

    I actually tried Collected notes briefly before but it generates and error for me (pasted here, https://pastebin.com/JQYM3D4d, expires in a week). I don't need to use it myself but might

    Also, THANK YOU for making and maintaining Better BibTex!! I have just started out with it but it has already been fantastically useful and enabled some of the features I considered switching to JabRef for.
  • Thanks, fixed. I'll roll it out in a new release when I have feedback on one more open issue.

    And you're welcome :)

    If you want abstracts + notes, you might also want to look at https://github.com/retorquere/zotero-report-customizer
  • Thanks for the link!
Sign In or Register to comment.