Searching note content outside Zotero
Searching in Zotero will reveal which items contain the search terms, but not where in these items the matches are. This means that I will have to click through each matched item attachment and search again inside the attachment. In contrast, if I search a text document using grep
, it will return the line with the match and optionally any number of lines before and after the matching line. This is great for quickly finding relevant matches, and it would be very useful if I could search through Zotero’s notes and pdf-attachments in this way.
The PDF attachments are easy enough, I store them in a dedicated dir and could just run pdftotext
on them, just like Zotero does. However, it seems like the notes are only contained within the sqlite database, and not the Zotero storage directory like web snapshots and other files. This means I need to perform a sql query to dump all the note content to files (preferably with the parent item as the file name) and then convert from html to md using pandoc
. I am not familiar with the database structure of Zotero (or that much with sql either for that matter), would such as query be straightforward? Do you have any advice on where to start?
After posting this I found out that the brilliant Better BibTex extension has an option to include notes when exporting to .bib files. It can also automatically keep these files up to date, which is quite convenient. The only downside is that it export a collection as a single file so tools like
grep
can’t indicate which item has the matching string simply by displaying the file name, but there is probably some BibText or BibLatex processing tool that could help me with that.I am still interested in a reply to my original question, but do you think this second approach would be smoother (and are there any additional solutions that I have overlooked)?
#!/bin/env bash
csplit -kq $1 "/^@/-1" "{*}"
rm xx00 # An empty file that is created when splitting
mkdir -p notes
mkdir -p abstracts
for split_file in xx*; do
# Create new file name
file_name=$(grep "file =" $split_file)
file_name="${file_name##*/}"
file_name="${file_name%.*}"
# Extract and save the relevant sections
rg "annotation = \{.*?\},\n" $split_file \
--multiline --multiline-dotall --no-line-number > notes/${file_name}.txt
rg "abstract = \{.*?\},\n" $split_file \
--multiline --multiline-dotall --no-line-number > abstracts/${file_name}.txt
rm $split_file
done
Afterwards all files in the created dirs can be searched with grep/rg.
I'll probably try it someday.
I actually tried Collected notes briefly before but it generates and error for me (pasted here, https://pastebin.com/JQYM3D4d, expires in a week). I don't need to use it myself but might
Also, THANK YOU for making and maintaining Better BibTex!! I have just started out with it but it has already been fantastically useful and enabled some of the features I considered switching to JabRef for.
And you're welcome :)
If you want abstracts + notes, you might also want to look at https://github.com/retorquere/zotero-report-customizer