Embedding annotations in EPUB file itself

This discussion was created from comments split from: Announcing the Zotero 7 Beta.
  • This is awesome progress! I just learned last week that Zotero 7 has been out for months and has implemented EPUB capabilities. Tested the development version on all my Apple devices, and it works really really well.

    I am curious about one thing though. Is it possible to embed EPUB annotations in the file itself, similar to how PDF annotations work? I have noticed that no other reader seems to be able to do this, so is this a fundamental limitation of the format itself?
  • Yes, epub doesn't have annotations as part of the format, so you wouldn't be able to save to the file.
  • I just learned last week that Zotero 7 has been out for months
    the beta. While 7beta is fully usable, it's not a formal release yet, and things are still being added/fixed. There's a feature missing from BBT pending such an addition to Zotero.
  • > Yes, epub doesn't have annotations as part of the format, so you wouldn't be able to save to the file.

    Thank you for clarifying. Would it be possible to export the EPUB in such a way that if re-imported back to Zotero the annotations show up in file? (Of course I don't expect this to work with other readers).

    Apple Books does something similar. If you re-import a deleted book, the old annotations show up in the document. Although I understand that's not exactly the same thing as what I have in mind since they seem to be storing the annotations in their database and then "re-applying" it. I hope this makes sense.
  • Would it be possible to export the EPUB in such a way that if re-imported back to Zotero the annotations show up in file?
    For what purpose?
  • @dstillman

    Imagine this, we have a book that we sometimes read on our office laptop, sometimes on a tablet, and sometimes on a computer at home. But after finishing the book, we want to gather all the notes (for example, the highlight ) we made while reading in different situations and keep the highlights associated with their original positions in the book. How can we do this conveniently?


    I have been searching for a solution that achieves the following goals:

    1. Associate the highlighted portions of the book with their original positions.
    2. Seamlessly synchronize reading progress, including the highlights, across devices.
    3. Export and import the highlights while maintaining their association with original positions in the book.


    PDF files can export highlights in .xfdf format or import .xfdf files as highlights, allow for appending or consolidating notes and highlight, which is a great feature.


    I used to read epub books with Calibre, a application like zotero, but it doesn't have an Android app. Later, I started using zotero to read epub books. Recently, zotero's web version introduced an online reading feature that allows for highlighting and note-taking, which is very convenient.


    However, I encountered an issue with zotero. When reading PDFs, the highlights are stored in the zotero database, and I can also choose to export the PDF with the highlights. **But this functionality doesn't exist for epub books.** So, if an epub book is in a zotero shared group, I can't move it with the highlights to My Library. However, if the book is in PDF format, I can export it from the zotero shared library and import it into My Library, consolidating highlights from multiple identical PDF files into one PDF.

    It seems that zotero's design principle is to minimize changes to the content of PDF and epub files. But considering that epub is essentially a compressed format, I think it would be possible to allow users to choose to export the highlights along with the epub files, maybe in a proprietary zotero format compressed into the epub file. When re-imported into zotero, it can import these highlights. In this regard, I think it might be worth considering the design approach of Calibre, and even maintaining compatibility to facilitate switching between the two software.

    In summary, I hope for a workflow that allows users to consolidate highlights from multiple identical epubs into one epub file, just like I do in PDF files.
  • Imagine this, we have a book that we sometimes read on our office laptop, sometimes on a tablet, and sometimes on a computer at home. But after finishing the book, we want to gather all the notes (for example, the highlight ) we made while reading in different situations and keep the highlights associated with their original positions in the book. How can we do this conveniently?
    Use the Zotero app on your work computer, tablet, and home computer? That's the whole point of Zotero syncing, and one of the main reasons the annotations are stored separately from the file to begin with.
    So, if an epub book is in a zotero shared group, I can't move it with the highlights to My Library.
    Of course you can. You drag it to your personal library, just as you do for a PDF.
    When reading PDFs, the highlights are stored in the zotero database, and I can also choose to export the PDF with the highlights. **But this functionality doesn't exist for epub books.**
    Because PDF has a format for embedded annotations shared between different readers and (as far as I know) EPUB doesn't. If there were some standard for exporting annotations with EPUB files, we would obviously support it as an option during export. But if it's only a question of importing back into Zotero, someone would have to actually make the case for why that would be useful. You could say "I want to send an EPUB with annotations to a Zotero-using friend without using a group library", but none of what you described depends on embedded annotations for PDFs and it won't for EPUBs either.
  • I just have a try, when I dragged a item, which contains an epub file, from the Zotero shared library to my personal library, the highlights were also dragged with epub file. This feature is great, which wasn't available in earlier versions zotero.


    BUT, if I want to send an EPUB with annotations to a Zotero-using friend without using a group library, what should I do?

    Perhaps embedded annotations for epub is a better solution.

    I am not sure whether the following is an standard for exporting annotations with EPUB files,
    https://en.wikipedia.org/wiki/EPUB#:~:text=The OPF file, traditionally named content.opf, houses the EPUB book's metadata, file manifest, and linear reading order.

    But I recommend you try using the software Calibre, which stores highlights and other notes from EPUB files in a central library (I guess). Additionally, it also saves a copy of these highlights in the "metadata.opf" file located in the same directory as the EPUB file.

    Perhaps Zotero can consider implementing a similar approach. Or you and the developers of Calibre work together to develop an universal and extensible ePub annotation standard.

    many thanks.
  • It could be cool to have a tool that facilitates collaboration on and sharing of EPUBs with annotations. But I don't want to sync an entire book every time I add an annotation. By approaching this in a modular way, I only have to sync the tiny annotation. Perhaps a developer could create a plug-in that allows you to "pack up" your EPUB before sharing it with another--sending the document with annotations--and to unpack it when they have added their annotations and returned it.
  • For what purpose?
    I believe it would greatly improve the data portability of ePub files. In an ideal world, we would have:

    1. Zotero ePub format: A hypothetical ePub format with support for embedded annotations. However, I appreciate the fact that the lack of an existing standard makes things difficult for you.

    2. Zotero ePub reader: A lightweight, stand-alone clone of the Zotero app's ePub viewer.

    Requesting a stand-alone ePub reader seems like too big of an ask for the development team, but creating a standalone format leaves open the possibility for some enterprising member of the community to eventually create it. It seems to me that that would be a reasonably straightforward task, but it is a completely uninformed opinion since I don't know much about programming or Zotero internals.

    The advantage of this system should be easy to see. We would be able to share ePub files with others without using group libraries or requiring others to install the full Zotero application. And in case I decide to stop using Zotero, I can still access these in-line annotations just using the stand-alone Zotero ePub reader.

    It would also be beneficial to my existing workflow of annotating textbooks. Once I am done annotating them, I'd rather not have them cluttering my Zotero library and would prefer to export them out (especially since some of them can have sizes in the hundreds of MBs), and re-import them back when needed.
  • But I recommend you try using the software Calibre, which stores highlights and other notes from EPUB files in a central library (I guess). Additionally, it also saves a copy of these highlights in the "metadata.opf" file located in the same directory as the EPUB file.
    Interesting - I wasn't aware that it stored annotations in metadata.opf alongside the EPUB.

    I'm seeing a couple issues:
    • Calibre's viewer doesn't seem to read annotations from a metadata.opf file placed alongside an EPUB that isn't in my library, and it doesn't import the annotations in metadata.opf when I add the EPUB to my library
    • This is 100% Calibre's own thing - the annotations are stored as JSON under a meta tag named calibre:annotation
    But this is still the closest thing I've seen to a portable EPUB annotation storage format that we could use.
  • There was work up until ~2015 for standardizing annotation support in ePUBs, similar to PDFs: https://idpf.org/epub/oa/. But as far as I know, that eventually fell apart and the few apps that support embedded ePUB annotations either directly modify the ePUB text colors or use custom properties that are hidden from other apps (and I think most just store them outside of the file). You're siloed into whatever app you read/write annotations with, but I think Zotero is probably the best silo to be in. I'd LOVE to have a universal annotation file format that supports all of PDF/HTML/ePUB including surviving new versions of and multiple filetypes of content, but getting there and getting others onboard would be very difficult.
  • I'd LOVE to have a universal annotation file format that supports all of PDF/HTML/ePUB including surviving new versions of and multiple filetypes of content
    That's the vision of the open web annotation standard. Hypothesis, e.g., works on all three format. Portability is also an issue there -- annotations are always outside of the file -- but it defines a standardized format for that so they should move seamlessly between applications.
    I don't know if Zotero evaluated that and if so why they decided against it, but if you want loss-less conversion to the PDF annotation standard (that one an ISO standard and much more widely used for PDFs, of course) that might be one reason.
  • edited January 3, 2024
    @adamsmith Thanks for pointing that out, I've seen that mentioned but need to read it if the vision is that similar and see how it stores cross-platform annotations. I've used Hypothes.is's ePUB annotation demos which are usable at least in https://github.com/elias-sundqvist/obsidian-annotator, but found few other EPUB-annotating apps so I'm thrilled by Zotero's new HTML/EPUB annotation support. Standardizing (if a suitable standard exists) would be even better as I'd love to read/write annotations in KOReader and I'm sure others would benefit from cross-app support as well.
  • Our annotations are already stored using the Web Annotation Data Model schema internally. (We only support a couple selector types right now, but we could support more if necessary for interoperability with other tools.) WADM makes it clear how to store the annotations themselves, but it gets pretty vague when it comes to defining the container format. We haven't implemented a way to export EPUB annotations because we haven't yet decided on a container format with wide support among EPUB readers that we could export to.

    AFAIK Hypothesis has never made its EPUB annotation system available through its browser extension - I could be wrong about that, but all I've seen are the demos. Its annotations are also not actually compliant with the WADM standard; e.g. here's one:


    {
    // ...
    "target": [
    {
    "source": "http://hypothesis.evidentpoint.com/readium-demo/?epub=epub_content%2Falice&goto=%7B%22idref%22%3A%22chapter_001%22%2C%22elementCfi%22%3A%22%2F4%2F2%2F2%5Bpgepubid00004%5D%2F2%5BI_DOWN_THE_RABBIT-HOLE%5D%22%7D&",
    "selector": [
    {
    "type": "RangeSelector",
    "endOffset": 155,
    "startOffset": 137,
    "endContainer": "/section[1]/p[3]",
    "startContainer": "/section[1]/p[3]"
    },
    {
    "end": 827,
    "type": "TextPositionSelector",
    "start": 809
    },
    {
    "type": "TextQuoteSelector",
    "exact": "Oh dear! Oh dear! ",
    "prefix": "ar the Rabbit\n\t\tsay to itself, \"",
    "suffix": "I shall be too late!\" But when t"
    }
    ]
    }
    ]
    }
    • The EPUB CFI - the most important piece of information for an EPUB annotation - is embedded within the query string of the URL (`&goto={"idref":"chapter_001","elementCfi":"/4/2/2[pgepubid00004]/2[I_DOWN_THE_RABBIT-HOLE]"}&`). It should be in a WADM FragmentSelector. Also, the URL 404s, and URLs from more recent annotations return a 502 ("It looks like the dev server has not been started yet").
    • the RangeSelector is invalid: it should have a startSelector and an endSelector (both WADM selectors of their own), but instead it's essentially the fields of a DOM Range object dumped to JSON.
    • All the data that I elided before "target" is Hypothesis-specific metadata (users, permissions, tags).
    So we could probably allow importing annotations exported from Hypothesis, but that might be of questionable value considering that they haven't made their EPUB annotator directly available to end users. I do not believe that we could easily create data that Hypothesis would be able to import (and we definitely don't want to use its export format as our main export format).

    I'll take a look at what obsidian-annotator does and whether we might be able to interoperate with it, or at least make it easy for an Obsidian-Zotero integration plugin to do that.
  • edited January 3, 2024
    Thanks, I think you're correct that Hypothesis hasn't made their EPUB annotator available outside the demos. obsidian-annotator is outdated and rather buggy, and I mainly added it because it does use Hypothesis for both HTML and EPUB annotation (it may have started with the demo code) and stores it directly in JSON embedded within Markdown. So an integration with that specifically may not be worth it, but being able to import and export annotations in general would be nice. In particular, I'm very interested in having KOReader (which can already read and annotate HTML/EPUB/PDF) open a Zotero attachment and load/render its synced Zotero annotations, then be able to add/modify annotations and propagate those back to Zotero (as then you can use Zotero items on all types of e-readers too, Android or not). But conflict resolution might be tricky here, as usual. Great to know that you use WADM.
  • It looks like KOReader lets you export to a variety of formats, including HTML, Markdown, and a custom JSON schema. All of them are lossy and there's no way a reader could support importing them. It also supports syncing with Memos, Joplin, and Readwise. Readwise is a one-way street as well - sort of like a Facebook news feed for your highlights. Memos seems to be similar. I haven't used Joplin so I'm not sure what the deal is there.

    But since they already support syncing with a few online services, and Zotero's EPUB (and snapshot) annotations are exposed with WADM selectors through the API, there's no reason someone couldn't write an integration on the KOReader side.
  • Yes, KOReader exports do not support WADM either - I don't know if anything does except for possibly Hypothesis and need to research it more. I don't think the exports are worth worrying about.

    So yes, a new KOReader integration with any WADM source including Zotero would be ideal (if Zotero also supports some sort of generic WADM import/export from any other app supporting WADM - I'm not sure if this is feasible or sensible), and a Koreader <-> Zotero direct integration via Zotero's SQL table for annotations could be an alternative, less universal option (and I also don't believe writing the SQL table is safe in Zotero).
  • EPUB annotations sync and are readable/writable with the Zotero API, so that would be the best option for an integration.
  • Our annotations are already stored using the Web Annotation Data Model schema internally.[etc.]
    Thanks Abe for that & all the details around it: I should have checked that before posting.
  • edited January 22, 2024
    Chiming in to say integrating koreader and zotero epub annotation would be very nice. Epub annotations have plagued me for years and excited to hear there is progress being made here. Even better than this would be tool agnostic, portable epub annotations. I can dream.
  • It seems there is currently no way to simply import EPUB annotations made by other applications into Zotero—please correct me if I'm wrong.

    Zotero for Android seems to require at least Android 6 https://forums.zotero.org/discussion/comment/451631/#Comment_451631 The fairly recent e-reader I have (Tolino Page 2) only has version 4.

    I would be interested to help make a Zotero plugin for importing EPUB annotations made by KOReader. I imagine something that mimics the "File → Import Annotations..." feature currently present for PDFs, using the `metadata.epub.lua` metadata file in KOReader sidecar folder.

    I would appreciate some advice:

    1. is such a plugin generally feasible? (I skimmed https://www.zotero.org/support/dev/client_coding/javascript_api#file_io already)
    2. is there enough data stored in KOReader `metadata.epub.lua` files to generate WADM-compliant annotations?
    3. is there an existing plugin which would be a good reference point to start from?
    4. any other tools/libraries that would help development? (e.g. Lua data handling, WADM schema testing, etc)
  • edited September 17, 2024
    @D0ug I've thought a lot about this and want to create something similar (more of an active syncing between the annotations in Zotero's database and KOReader's sidecar files when all stored together in the hashdocsettings/ subfolder, but the end result is similar).

    One challenge may be translating the coordinate point system between Zotero and KOReader - last I tested, I think one had 0,0 be in top-left corner and the other in bottom-left, and the ratio was different. But that can be solved.

    Zotero documentation says somewhere that you shouldn't edit the database directly which might cause corruption. My plan has been to sync with the Zotero Web API which I think can read/write annotations. This of course requires having an account but seemed simpler to me as an initial step, then syncs Zotero annotations between devices too.

    This might be helpful: https://github.com/stelzch/zotero.koplugin/issues/13

    You might want to use the upcoming release of KOReader which finally adds color annotation support that is most similar to Zotero - this will change the annotation schema in metadata.epub.lua (and metadata.pdf.lua, etc.).

    WADM-compliance in KOReader - I don't know enough about the standard though I think WADM is the way to go. There is a fair amount of metadata per annotation stored.

    I'm happy to DM with you if you want to discuss more (https://www.reddit.com/user/ryanwwest/), or new discussion here - I don't have time to develop right now but know a fair bit about KOReader and this specific issue.
  • I am struggling with importing to Zotero my EPUB annotations made in KOReader via the API and Python to be visible in Zotero EPUB reader. I see that this question was raised above, so perhaps it is meaningful to continue with that here.

    I've already figured how to transfer my PDF annotations this way (the most tricky part was how to translate the positions, but it finally works). I will share my script once I figure out the solution for EPUBs.

    Now I want to achieve the same with EPUBs. The tricky part are again the positions. In case of koreader, the annotation position is encoded as here:

    ```
    {'chapter': 'preface',
    'page': '/body/DocFragment[9]/body/p[25]/span/text()[2].189',
    'pos0': '/body/DocFragment[9]/body/p[25]/span/text()[2].189',
    'pos1': '/body/DocFragment[9]/body/p[25]/span/text()[2].491',
    'text': 'One poem by itself was certainly not responsible for an entire intellectual, moral, and social transformation—no single work was, let alone one that for centuries could not without danger be spoken about freely in public. But this particular ancient book, suddenly returning to view, made a difference.'}

    When I manually do the same annotation in Zotero reader and extract it via the API (as children of my attachment), its position is expressed like this (I select only a few relevant attributes):

    { 'annotationText': 'One poem by itself was certainly not responsible for an entire intellectual, moral, and social transformation—no single work was, let alone one that for centuries could not without danger be spoken about freely in public. But this particular ancient book, suddenly returning to view, made a difference.',
    'annotationComment': '',
    'annotationColor': '#ffd400',
    'annotationSortIndex': '00008|00019912',
    'annotationPosition': '{"type":"FragmentSelector","conformsTo":"http://www.idpf.org/epub/linking/cfi/epub-cfi.html","value":"epubcfi(/6/18!/4/68/2,/3:189,/3:491)"}'}

    I see the logic of CFI as documented here: https://idpf.org/epub/linking/cfi/epub-cfi.html. I just wasn't able to figure out how to get to these from my KOReader positions. I have also experimented with loading the EPUB file itself, map its table of content on individual chapters etc. But I still don't see how to get it to the shape I see in Zotero, since I don't get how to interpret "18!" as a chapter reference in my EPUB etc.

    Any hints?
  • edited October 23, 2024
    @vojtech.kase: Unfortunately, I don't think it's going to be possible to convert KOReader positions to CFIs without analyzing the EPUB. (As you inferred.)

    I'll use your annotation as an example.

    KOReader positions are apparently XPointers, which are an extended version of XPaths, so, for example, "p[25]" refers to the 25th <p> element (1-indexed). DocFragments are <itemref>s inside the EPUB's <spine>.

    CFIs are rooted in the EPUB's content.opf file. They're also 1-indexed, but they alternate between text nodes and element nodes, so /1 is the first text node child, /2 is the first element child, /3 is the second text node child, and so on. That's the case even if there are multiple consecutive elements or text nodes; for example, if there's a single element child followed by a single text node child, /1 would reference the (nonexistent) text node before the element, /2 would reference the element, and /3 would reference the text node.

    The ! indicates step indirection, i.e., "follow that link."

    So, putting that all together:
    • /6/18 selects the third element child of <package> in content.opf (<spine>), then the 9th element child of that (one of the <itemref>s). In other words, DocFragment[9].
    • ! means "follow that link": find the child of <manifest> whose id attribute equals the idref attribute of the <itemref>, get its href, load it. Everything past here is relative to the root of that document (its <html> tag).
    • /4/68/2 selects the second element child of <html> (<body>), the 34th element child of that (a <p> - different from KOReader's index because "p[25]" selects the 25th <p> child, not the 25th child overall, and the 25th <p> child is the 34th child overall), then the first element child of that (a <span>).
    • ,/3:189,/3:491 creates a range within the second text child ("text()[2]") from the 189th character to the 491st character.
    Hopefully that helps you see the concordances between KOReader's positions and CFIs, at least a little bit.

    Calibre uses CFIs as well (uh, kind of), so there's definitely demand for a converter, but as far as I know there isn't yet one available. A converter would basically need to pull in some library for resolving XPointers, then use a CFI implementation (like Zotero's fork of EPUB.js) to create CFIs from them. KOReader is by far the most frequently requested EPUB integration, and it might be relatively straightforward to implement that conversion in Zotero. I'll look into it.

    (Apologies for the gigantic infodump!)
Sign In or Register to comment.