ZotFile annotation extraction bug

ZotFile does not extract the "anno.content" portion of the PDF annotations for "Underline" comments, only the "anno.markup" portion. In other words, it does not extract my personal text that I typed into the comment box, only the underlined (markup) text itself. But it works just fine with "Highlighted" comments (extracts both the content and markup components of the annotation into Zotero as expected). Tried it with multiple PDFs with multiple instances of Text ("bubble"), Highlight and Underline type comments in one PDF and only "Underline" has this problem. Unfortunately, I previously chose to use underline in all of my refs, so it's not like I can just switch... Using latest versions of Zotero and ZotFile on a Windows machine.

I played around with pdfAnnotations.js for a while trying to find a solution, but I usually program in LabVIEW, so this text stuff is way out of my league. Best I can figure, the value of "anno.content" for "Underline" is getting dropped somewhere upstream of pdfAnnotations.js in a way that it is not for "Highlight". Otherwise, I'd just assume it was an unsupported feature.

I hope that explanation is concise and to the point. If you need more context, please see my separate post 4 days ago.
  • edited February 14, 2020
    Figured it out for ya. Sort of.

    In the ZotFile .xpi package, which I have educated myself to understand is just a zip file with the extension changed and therefore can be changed back to .zip to see inside, there is a file:

    zotfile-5.0.16-fx/chrome/content/zotfile/pdfextract/pdfjs/src/core/annotation.js

    Which for those of you like myself who don't do javascript, it can be opened in a text editor (by the way if you ever need to work with large text files into the 10's of GB like merged .FASTA's or other DNA sequencing files, EmEditor is fantastic)

    On Line 81, there is a conditional statement that basically says "if the annotation is subtype Highlight, then get its contents" (which is your comment)

    Unfortunately, it doesn't have a symmetrical function for Underline comments, who knows why. I guess in that sense, it's more of an unsupported feature than a bug.

    But if you change the word 'Highlight' on Line 81 to 'Underline' and save the file, then rezip all of the original files just like they were, change the .zip extension to .xpi, and then reinstall the add-on from Zotero (go to Add-ons Manager, go to the gear wheel, select Install Add-on from File, and select that .xpi you just made (ie. no compiling necessary)...

    Then whallah! ZotFile will now extract your comments attached to underlined text INSTEAD of highlighted text. So it is still half-broken and only switched the functionality, but if you're like me and need only the underlined comments, you can proceed with writing your freaking dissertation.

    I apologize for not trying to pull a branch and push some kind of modification to the master source, but I am just in way over my head as far as being comfortable with implementing a good fix that would extract the comments from both Highlights and Underline.

    Sorry to be longwinded, but the time it takes to read this post pales in comparison to the 2-3 days I just blew learning that I didn't need to find my Github password or download and install MS Visual Studio, Git for Windows, Make for Windows, Cygwin64 or fire up an Ubuntu VM to fix my problem...

    Hope that helps someone else!

    Thanks ZotFile (seriously), I finally forced myself to learn a little bit about traditional coding. Perfect addition to my resume, now I'll get paid a lot more.
  • One more thing to add if you want to modify the way annotations are imported.

    Only three PDF annotation types are supported: (Sticky) notes, Highlight and Underline. Each instance of Highlight and Underline in your PDF has two components (markup and comment), as described above. By default, your imported result will alternate between a paragraph of the markup and a paragraph of your personal comment, except for the sticky note which only consists of your comment.

    I have decided that I only want to see are my personal notes, none of the original underlined or highlighted text, just easier for me to focus.

    If that's what you want too:

    Zotero -> Edit -> Preferences -> General tab -> Advanced Config Editor

    go to the preference "extensions.zotfile.pdfExtraction.formatAnnotationUnderline", right click, Modify, and just delete all of it. You can do the same for "formatAnnotationHighlight" if you want. Then, when the code executes and hits this preference, it just won't print the markup component into your Zotero note.

    But don't delete the value for "formatAnnotationNote" because that preference conveys the value of your personal comment for ALL THREE of the supported annotations. In other words, "formatAnnotationNote" applies to your personal comment for Notes AND Highlights and Underlines, not just the (sticky) notes.

  • On a related note, I'm finding that the poppler extraction treats what I think are highlights as underlined and so all the extracts are underlined in the note.

    Process: edit a PDF with Adobe Reader (Mac), Preview.app, and - on my iPad - PDF Viewer. In each case highlight text with the highlight option. Then use Zotfile to extract the annotations. In each case poppler things everything is underlined (and pdf.js does not).

    I should note that I've got the poppler set installed via homebrew and pdf.js is what's checked in the Zotfile pref. I get two notes, one via each extraction method. (I also find that neither is perfect, so combining the two sets of results often gets me a better final product.)

    Any thoughts?
  • @John_muccigrosso The poppler support in Zotfile has not been updated in many years and isn't really supported anymore. I recommend using pdf.js instead.
  • Thanks. As I said I find the two tools complementary. I can just modify the annotation markup so that there are no underlines in the output.

    And maybe I’ll see about getting zotfile to change its settings to reflect the support reality you mention.
Sign In or Register to comment.