Parsing HTML tags in Notes

jmgmbl · June 28, 2020

Hi,

A few years ago I swapped over from RefWorks to Zotero and when I exported my RefWorks library as a BibTeX file, the text formatting in the "Notes" field was rendered with html tags. When I imported the library into Zotero, the notes fields populated with the strings as they were in the BibTeX file—i.e. with unparsed html tags. For example, one of my notes appears as:

"Review essay of: Laura Gowing, <i>Common Bodies</i>; Valeria Finucci, <i>The Manly Masquerade</i>; Richard Helgerson, <i>Adulterous Alliances</i>; Jeffrey Merrick and Michael Sibalis, <i>Homosexuality in French History and Culture</i>; Katherine O'Donnell and Michael O'Rourke, <i>Love, Sex, Intimacy, and Friendship between Men</i>, <i>1550-1800</i>; Betteridge, <i>Sodomy in Early Modern Europe</i>; Bray, <i>The Friend</i>; Valerie Traub, <i>The Renaissance of Lesbianism in Early Modern England; </i>Walter Stephens, <i>Demon Lovers</i>; Thomas Laqueur, <i>Solitary Sex.</i><br><br>"

In RefWorks, of course, those tags were parsed and the text was formatted. Is there some way to systematically get Zotero to go through and parse all those tags and format the text in the notes? There's really never anything fancier than bold, italics, and breaks—all of which you can do natively in the notes editor in Zotero.

I've been sort of piecemeal going through and cleaning notes up as I happen upon them over the past couple of years, but it's time consuming, and also feels a little silly since I'm manually translating machine-readable language into presumably the exact same machine-readable language, only through a user interface.

Thanks for any help anyone is able to provide!

dstillman · June 29, 2020

Close Zotero, make a backup of zotero.sqlite in your Zotero data directory, temporarily disable auto-sync in the Sync pane of the preferences (if you're using syncing), and then run this from Tools → Developer → Run JavaScript:

var s = new Zotero.Search();
s.addCondition('note', 'contains', '&lt;i&gt;');
var ids = await s.search();
for (let id of ids) {
    let item = Zotero.Items.get(id);
    let note = item.getNote();
    item.setNote(Zotero.Utilities.unescapeHTML(note));
    await item.saveTx();
}
return ids.length + " item(s) updated";

This will convert all notes that have a double-encoded <i>. If there are other notes, you could change it to check for '<b>' for <b> or similar.

If you're happy with the results, you can re-enable auto-sync.

jmgmbl · June 29, 2020

This worked perfectly—thank you so much! I wished I'd asked years ago now.