ZotFile: Extracting more features of annotations

@Joscha - I was just wondering whether it's possible to extract e.g. the 'comment' text (which can be attached to highlights) or perhaps even comments themselves, or the colour of the highlight? This obviously depends on PDF.js or Poppler and I saw a comment on github (for PDF.js, ~2016) which presumably was from you.

Could you let me know what's possible?
  • Comment bubbles and the color of highlights are extracted by Zotfile by default.
  • Thanks for that - yes, indeed, comments are extracted! It's a shame though that the comments are dissociated from the highlights. For example, a comment attached to a highlight appears as separate items:
    "working definitions of the myriad terms used in ICT in TVET are given as Annex A." (Herd and Richardson 2015:26) [highlight]

    comment on highlight (note on p.26)

    reply to comment on highlight (note on p.26)

    Separate comment (note on p.26)
    Do you think that might be fixable?

    I cannot see the colours being extracted? Is this with pdf.js or Poppler?
  • Just to check - when you say 'colour of highlights', did you mean that coloured highlights are extracted (of any colour) or that the colour is extracted alongside the highlight?
  • Search for “color” here for details http://zotfile.com/
  • edited December 17, 2018
    Ah - this is super-cool, thank you so much!

    Just to share what I found: Basically, go to about:config within Zotero, and modify these hidden prefs, e.g. as follows:
    extensions.zotfile.pdfExtraction.formatAnnotationHighlight => <p>"%(content)" (%(cite)) (%(label); p.%(page); %(color_category)/%(color); %(uri))</p>

    extensions.zotfile.pdfExtraction.formatAnnotationNote => <p><i>%(content) (<a href="%(uri)">note on p.%(page)</a>) (%(label); p.%(page); %(uri))</p><br>

    extensions.zotfile.pdfExtraction.formatAnnotationUnderline => <p>"<u>%(content)</u>" (%(label); p.%(page); %(uri))</p>
    Edit: The above ones don't use the colour - so just to be complete:
    extensions.zotfile.pdfExtraction.formatAnnotationHighlight => <p><span style="background-color:%(color);">"%(content)"</span> (%(cite)) (%(label); p.%(page); %(color_category)/%(color); %(uri))</p>
    ZotFile can also generate multiple notes, based on colour. To get this, set
    extensions.zotfile.pdfExtraction.colorNotes => true

    Thanks again!
  • @Josha, @bwiernik - do you happen to know whether the way cite %(cite) is rendered can be set? E.g. could it be set to scannable cite? (I.e. introducing %(scannablecite) )? Or can it be changed otherwise?
  • It's hardcoded in ZotFile, but obviously ZotFile itself can be modified if you're so inclined, but then you're looking at actual coding work rather than just changing a setting.
  • edited December 17, 2018
    I've added 'group' and 'key' to ZotFile, and created a pull request here: https://github.com/jlegewie/zotfile/pull/384. The changes allow you to use %(group) and %(key) to include the Zotero item group / key to the note text produced by ZotFile.

    Purpose: If you paste the note (e.g. into another document) the Zotero group/key allows you to relocate the item in your Zotero library.

    Moreover, this means you can produce add 'scannable cites' to the notes, as follows: Set 'extensions.zotfile.pdfExtraction.formatAnnotationHighlight' to
    <p>"%(content)" {|(%(cite))|||zg:%(group):%(key)})</span></p>
    (for a group library).
  • Also see discussion here https://forums.zotero.org/discussion/75020/scannable-cite-rtf-odf-scan-for-zotero-how-are-the-citation-texts-produced
    Basically, using the above translator as well as the amended version of ZotFile, it's now possible to also produce 'quotation + scannable cite' (via ZotFiile) in addition to just 'scannable cites' (directly from Zotero items as always). Likewise, it's possible to produce other 'scannable-cite-like formats' (such as 'Bjoern's citation strings'), as 'quotation + scannable-cite-like cite' (via ZotFiile) and 'scannable-cite-like citatation' (via the 'Bjoerns Citation String.js' translator, which is essentially the same as the scannable cite translator).
  • Just to add: while (in a pdf) both 'stand-alone notes' and 'notes attached to highlight' are extracted similarly by ZotFile, you can actually distinguish them if they have different colours. Have added 'hex_color' here https://github.com/bjohas/zotfile and PR here https://github.com/jlegewie/zotfile/pull/384. Once/if the PR is accepted, these options can be used in ZotFile.
Sign In or Register to comment.