Embedding annotations in EPUB file itself

dstillman · December 31, 2023

This discussion was created from comments split from: Announcing the Zotero 7 Beta.

ks8997 · December 26, 2023

This is awesome progress! I just learned last week that Zotero 7 has been out for months and has implemented EPUB capabilities. Tested the development version on all my Apple devices, and it works really really well.

I am curious about one thing though. Is it possible to embed EPUB annotations in the file itself, similar to how PDF annotations work? I have noticed that no other reader seems to be able to do this, so is this a fundamental limitation of the format itself?

adamsmith · December 26, 2023

Yes, epub doesn't have annotations as part of the format, so you wouldn't be able to save to the file.

emilianoeheyns · December 26, 2023

I just learned last week that Zotero 7 has been out for months

the beta. While 7beta is fully usable, it's not a formal release yet, and things are still being added/fixed. There's a feature missing from BBT pending such an addition to Zotero.

ks8997 · December 26, 2023

> Yes, epub doesn't have annotations as part of the format, so you wouldn't be able to save to the file.

Thank you for clarifying. Would it be possible to export the EPUB in such a way that if re-imported back to Zotero the annotations show up in file? (Of course I don't expect this to work with other readers).

Apple Books does something similar. If you re-import a deleted book, the old annotations show up in the document. Although I understand that's not exactly the same thing as what I have in mind since they seem to be storing the annotations in their database and then "re-applying" it. I hope this makes sense.

dstillman · December 31, 2023

Would it be possible to export the EPUB in such a way that if re-imported back to Zotero the annotations show up in file?

For what purpose?

QiDiamond · December 31, 2023

@dstillman

Imagine this, we have a book that we sometimes read on our office laptop, sometimes on a tablet, and sometimes on a computer at home. But after finishing the book, we want to gather all the notes (for example, the highlight ) we made while reading in different situations and keep the highlights associated with their original positions in the book. How can we do this conveniently?

I have been searching for a solution that achieves the following goals:

1. Associate the highlighted portions of the book with their original positions.
2. Seamlessly synchronize reading progress, including the highlights, across devices.
3. Export and import the highlights while maintaining their association with original positions in the book.

PDF files can export highlights in .xfdf format or import .xfdf files as highlights, allow for appending or consolidating notes and highlight, which is a great feature.

I used to read epub books with Calibre, a application like zotero, but it doesn't have an Android app. Later, I started using zotero to read epub books. Recently, zotero's web version introduced an online reading feature that allows for highlighting and note-taking, which is very convenient.

However, I encountered an issue with zotero. When reading PDFs, the highlights are stored in the zotero database, and I can also choose to export the PDF with the highlights. **But this functionality doesn't exist for epub books.** So, if an epub book is in a zotero shared group, I can't move it with the highlights to My Library. However, if the book is in PDF format, I can export it from the zotero shared library and import it into My Library, consolidating highlights from multiple identical PDF files into one PDF.

It seems that zotero's design principle is to minimize changes to the content of PDF and epub files. But considering that epub is essentially a compressed format, I think it would be possible to allow users to choose to export the highlights along with the epub files, maybe in a proprietary zotero format compressed into the epub file. When re-imported into zotero, it can import these highlights. In this regard, I think it might be worth considering the design approach of Calibre, and even maintaining compatibility to facilitate switching between the two software.

In summary, I hope for a workflow that allows users to consolidate highlights from multiple identical epubs into one epub file, just like I do in PDF files.

dstillman · December 31, 2023

Imagine this, we have a book that we sometimes read on our office laptop, sometimes on a tablet, and sometimes on a computer at home. But after finishing the book, we want to gather all the notes (for example, the highlight ) we made while reading in different situations and keep the highlights associated with their original positions in the book. How can we do this conveniently?

Use the Zotero app on your work computer, tablet, and home computer? That's the whole point of Zotero syncing, and one of the main reasons the annotations are stored separately from the file to begin with.

So, if an epub book is in a zotero shared group, I can't move it with the highlights to My Library.

Of course you can. You drag it to your personal library, just as you do for a PDF.

When reading PDFs, the highlights are stored in the zotero database, and I can also choose to export the PDF with the highlights. **But this functionality doesn't exist for epub books.**

Because PDF has a format for embedded annotations shared between different readers and (as far as I know) EPUB doesn't. If there were some standard for exporting annotations with EPUB files, we would obviously support it as an option during export. But if it's only a question of importing back into Zotero, someone would have to actually make the case for why that would be useful. You could say "I want to send an EPUB with annotations to a Zotero-using friend without using a group library", but none of what you described depends on embedded annotations for PDFs and it won't for EPUBs either.

QiDiamond · December 31, 2023

I just have a try, when I dragged a item, which contains an epub file, from the Zotero shared library to my personal library, the highlights were also dragged with epub file. This feature is great, which wasn't available in earlier versions zotero.

BUT, if I want to send an EPUB with annotations to a Zotero-using friend without using a group library, what should I do?

Perhaps embedded annotations for epub is a better solution.

I am not sure whether the following is an standard for exporting annotations with EPUB files,
https://en.wikipedia.org/wiki/EPUB#:~:text=The OPF file, traditionally named content.opf, houses the EPUB book's metadata, file manifest, and linear reading order.

But I recommend you try using the software Calibre, which stores highlights and other notes from EPUB files in a central library (I guess). Additionally, it also saves a copy of these highlights in the "metadata.opf" file located in the same directory as the EPUB file.

Perhaps Zotero can consider implementing a similar approach. Or you and the developers of Calibre work together to develop an universal and extensible ePub annotation standard.

many thanks.

DonnaCoxBaker · December 31, 2023

It could be cool to have a tool that facilitates collaboration on and sharing of EPUBs with annotations. But I don't want to sync an entire book every time I add an annotation. By approaching this in a modular way, I only have to sync the tiny annotation. Perhaps a developer could create a plug-in that allows you to "pack up" your EPUB before sharing it with another--sending the document with annotations--and to unpack it when they have added their annotations and returned it.

ks8997 · January 2, 2024

For what purpose?

I believe it would greatly improve the data portability of ePub files. In an ideal world, we would have:

1. Zotero ePub format: A hypothetical ePub format with support for embedded annotations. However, I appreciate the fact that the lack of an existing standard makes things difficult for you.

2. Zotero ePub reader: A lightweight, stand-alone clone of the Zotero app's ePub viewer.

Requesting a stand-alone ePub reader seems like too big of an ask for the development team, but creating a standalone format leaves open the possibility for some enterprising member of the community to eventually create it. It seems to me that that would be a reasonably straightforward task, but it is a completely uninformed opinion since I don't know much about programming or Zotero internals.

The advantage of this system should be easy to see. We would be able to share ePub files with others without using group libraries or requiring others to install the full Zotero application. And in case I decide to stop using Zotero, I can still access these in-line annotations just using the stand-alone Zotero ePub reader.

It would also be beneficial to my existing workflow of annotating textbooks. Once I am done annotating them, I'd rather not have them cluttering my Zotero library and would prefer to export them out (especially since some of them can have sizes in the hundreds of MBs), and re-import them back when needed.

AbeJellinek · January 2, 2024

But I recommend you try using the software Calibre, which stores highlights and other notes from EPUB files in a central library (I guess). Additionally, it also saves a copy of these highlights in the "metadata.opf" file located in the same directory as the EPUB file.

Interesting - I wasn't aware that it stored annotations in metadata.opf alongside the EPUB.

I'm seeing a couple issues:

Calibre's viewer doesn't seem to read annotations from a metadata.opf file placed alongside an EPUB that isn't in my library, and it doesn't import the annotations in metadata.opf when I add the EPUB to my library
This is 100% Calibre's own thing - the annotations are stored as JSON under a meta tag named calibre:annotation

But this is still the closest thing I've seen to a portable EPUB annotation storage format that we could use.

ryanwwest · January 3, 2024

There was work up until ~2015 for standardizing annotation support in ePUBs, similar to PDFs: https://idpf.org/epub/oa/. But as far as I know, that eventually fell apart and the few apps that support embedded ePUB annotations either directly modify the ePUB text colors or use custom properties that are hidden from other apps (and I think most just store them outside of the file). You're siloed into whatever app you read/write annotations with, but I think Zotero is probably the best silo to be in. I'd LOVE to have a universal annotation file format that supports all of PDF/HTML/ePUB including surviving new versions of and multiple filetypes of content, but getting there and getting others onboard would be very difficult.

adamsmith · January 3, 2024

I'd LOVE to have a universal annotation file format that supports all of PDF/HTML/ePUB including surviving new versions of and multiple filetypes of content

That's the vision of the open web annotation standard. Hypothesis, e.g., works on all three format. Portability is also an issue there -- annotations are always outside of the file -- but it defines a standardized format for that so they should move seamlessly between applications.
I don't know if Zotero evaluated that and if so why they decided against it, but if you want loss-less conversion to the PDF annotation standard (that one an ISO standard and much more widely used for PDFs, of course) that might be one reason.

ryanwwest · January 3, 2024

@adamsmith Thanks for pointing that out, I've seen that mentioned but need to read it if the vision is that similar and see how it stores cross-platform annotations. I've used Hypothes.is's ePUB annotation demos which are usable at least in https://github.com/elias-sundqvist/obsidian-annotator, but found few other EPUB-annotating apps so I'm thrilled by Zotero's new HTML/EPUB annotation support. Standardizing (if a suitable standard exists) would be even better as I'd love to read/write annotations in KOReader and I'm sure others would benefit from cross-app support as well.

AbeJellinek · January 3, 2024

Our annotations are already stored using the Web Annotation Data Model schema internally. (We only support a couple selector types right now, but we could support more if necessary for interoperability with other tools.) WADM makes it clear how to store the annotations themselves, but it gets pretty vague when it comes to defining the container format. We haven't implemented a way to export EPUB annotations because we haven't yet decided on a container format with wide support among EPUB readers that we could export to.

AFAIK Hypothesis has never made its EPUB annotation system available through its browser extension - I could be wrong about that, but all I've seen are the demos. Its annotations are also not actually compliant with the WADM standard; e.g. here's one:


    {
      // ...
      "target": [
        {
          "source": "http://hypothesis.evidentpoint.com/readium-demo/?epub=epub_content%2Falice&goto=%7B%22idref%22%3A%22chapter_001%22%2C%22elementCfi%22%3A%22%2F4%2F2%2F2%5Bpgepubid00004%5D%2F2%5BI_DOWN_THE_RABBIT-HOLE%5D%22%7D&",
          "selector": [
            {
              "type": "RangeSelector",
              "endOffset": 155,
              "startOffset": 137,
              "endContainer": "/section[1]/p[3]",
              "startContainer": "/section[1]/p[3]"
            },
            {
              "end": 827,
              "type": "TextPositionSelector",
              "start": 809
            },
            {
              "type": "TextQuoteSelector",
              "exact": "Oh dear! Oh dear! ",
              "prefix": "ar the Rabbit\n\t\tsay to itself, \"",
              "suffix": "I shall be too late!\" But when t"
            }
          ]
        }
      ]
    }

The EPUB CFI - the most important piece of information for an EPUB annotation - is embedded within the query string of the URL (`&goto={"idref":"chapter_001","elementCfi":"/4/2/2[pgepubid00004]/2[I_DOWN_THE_RABBIT-HOLE]"}&`). It should be in a WADM FragmentSelector. Also, the URL 404s, and URLs from more recent annotations return a 502 ("It looks like the dev server has not been started yet").
the RangeSelector is invalid: it should have a startSelector and an endSelector (both WADM selectors of their own), but instead it's essentially the fields of a DOM Range object dumped to JSON.
All the data that I elided before "target" is Hypothesis-specific metadata (users, permissions, tags).

So we could probably allow importing annotations exported from Hypothesis, but that might be of questionable value considering that they haven't made their EPUB annotator directly available to end users. I do not believe that we could easily create data that Hypothesis would be able to import (and we definitely don't want to use its export format as our main export format).

I'll take a look at what obsidian-annotator does and whether we might be able to interoperate with it, or at least make it easy for an Obsidian-Zotero integration plugin to do that.

ryanwwest · January 3, 2024

Thanks, I think you're correct that Hypothesis hasn't made their EPUB annotator available outside the demos. obsidian-annotator is outdated and rather buggy, and I mainly added it because it does use Hypothesis for both HTML and EPUB annotation (it may have started with the demo code) and stores it directly in JSON embedded within Markdown. So an integration with that specifically may not be worth it, but being able to import and export annotations in general would be nice. In particular, I'm very interested in having KOReader (which can already read and annotate HTML/EPUB/PDF) open a Zotero attachment and load/render its synced Zotero annotations, then be able to add/modify annotations and propagate those back to Zotero (as then you can use Zotero items on all types of e-readers too, Android or not). But conflict resolution might be tricky here, as usual. Great to know that you use WADM.

AbeJellinek · January 3, 2024

It looks like KOReader lets you export to a variety of formats, including HTML, Markdown, and a custom JSON schema. All of them are lossy and there's no way a reader could support importing them. It also supports syncing with Memos, Joplin, and Readwise. Readwise is a one-way street as well - sort of like a Facebook news feed for your highlights. Memos seems to be similar. I haven't used Joplin so I'm not sure what the deal is there.

But since they already support syncing with a few online services, and Zotero's EPUB (and snapshot) annotations are exposed with WADM selectors through the API, there's no reason someone couldn't write an integration on the KOReader side.

ryanwwest · January 4, 2024

Yes, KOReader exports do not support WADM either - I don't know if anything does except for possibly Hypothesis and need to research it more. I don't think the exports are worth worrying about.

So yes, a new KOReader integration with any WADM source including Zotero would be ideal (if Zotero also supports some sort of generic WADM import/export from any other app supporting WADM - I'm not sure if this is feasible or sensible), and a Koreader <-> Zotero direct integration via Zotero's SQL table for annotations could be an alternative, less universal option (and I also don't believe writing the SQL table is safe in Zotero).

AbeJellinek · January 4, 2024

EPUB annotations sync and are readable/writable with the Zotero API, so that would be the best option for an integration.

adamsmith · January 4, 2024

Our annotations are already stored using the Web Annotation Data Model schema internally.[etc.]

Thanks Abe for that & all the details around it: I should have checked that before posting.

ausungate · January 17, 2024

Chiming in to say integrating koreader and zotero epub annotation would be very nice. Epub annotations have plagued me for years and excited to hear there is progress being made here. Even better than this would be tool agnostic, portable epub annotations. I can dream.

D0ug · September 17, 2024

It seems there is currently no way to simply import EPUB annotations made by other applications into Zotero—please correct me if I'm wrong.

Zotero for Android seems to require at least Android 6 https://forums.zotero.org/discussion/comment/451631/#Comment_451631 The fairly recent e-reader I have (Tolino Page 2) only has version 4.

I would be interested to help make a Zotero plugin for importing EPUB annotations made by KOReader. I imagine something that mimics the "File → Import Annotations..." feature currently present for PDFs, using the `metadata.epub.lua` metadata file in KOReader sidecar folder.

I would appreciate some advice:

1. is such a plugin generally feasible? (I skimmed https://www.zotero.org/support/dev/client_coding/javascript_api#file_io already)
2. is there enough data stored in KOReader `metadata.epub.lua` files to generate WADM-compliant annotations?
3. is there an existing plugin which would be a good reference point to start from?
4. any other tools/libraries that would help development? (e.g. Lua data handling, WADM schema testing, etc)

ryanwwest · September 17, 2024

@D0ug I've thought a lot about this and want to create something similar (more of an active syncing between the annotations in Zotero's database and KOReader's sidecar files when all stored together in the hashdocsettings/ subfolder, but the end result is similar).

One challenge may be translating the coordinate point system between Zotero and KOReader - last I tested, I think one had 0,0 be in top-left corner and the other in bottom-left, and the ratio was different. But that can be solved.

Zotero documentation says somewhere that you shouldn't edit the database directly which might cause corruption. My plan has been to sync with the Zotero Web API which I think can read/write annotations. This of course requires having an account but seemed simpler to me as an initial step, then syncs Zotero annotations between devices too.

This might be helpful: https://github.com/stelzch/zotero.koplugin/issues/13

You might want to use the upcoming release of KOReader which finally adds color annotation support that is most similar to Zotero - this will change the annotation schema in metadata.epub.lua (and metadata.pdf.lua, etc.).

WADM-compliance in KOReader - I don't know enough about the standard though I think WADM is the way to go. There is a fair amount of metadata per annotation stored.

I'm happy to DM with you if you want to discuss more (https://www.reddit.com/user/ryanwwest/), or new discussion here - I don't have time to develop right now but know a fair bit about KOReader and this specific issue.

Square.BubbLe · September 19, 2024

Here is another related discussion : https://forums.zotero.org/discussion/113670/save-export-annotated-html-snapshot-with-all-annotated-text-colors-preserved#latest

vojtech.kase · October 23, 2024

I am struggling with importing to Zotero my EPUB annotations made in KOReader via the API and Python to be visible in Zotero EPUB reader. I see that this question was raised above, so perhaps it is meaningful to continue with that here.

I've already figured how to transfer my PDF annotations this way (the most tricky part was how to translate the positions, but it finally works). I will share my script once I figure out the solution for EPUBs.

Now I want to achieve the same with EPUBs. The tricky part are again the positions. In case of koreader, the annotation position is encoded as here:

```
{'chapter': 'preface',
'page': '/body/DocFragment[9]/body/p[25]/span/text()[2].189',
'pos0': '/body/DocFragment[9]/body/p[25]/span/text()[2].189',
'pos1': '/body/DocFragment[9]/body/p[25]/span/text()[2].491',
'text': 'One poem by itself was certainly not responsible for an entire intellectual, moral, and social transformation—no single work was, let alone one that for centuries could not without danger be spoken about freely in public. But this particular ancient book, suddenly returning to view, made a difference.'}

When I manually do the same annotation in Zotero reader and extract it via the API (as children of my attachment), its position is expressed like this (I select only a few relevant attributes):

{ 'annotationText': 'One poem by itself was certainly not responsible for an entire intellectual, moral, and social transformation—no single work was, let alone one that for centuries could not without danger be spoken about freely in public. But this particular ancient book, suddenly returning to view, made a difference.',
'annotationComment': '',
'annotationColor': '#ffd400',
'annotationSortIndex': '00008|00019912',
'annotationPosition': '{"type":"FragmentSelector","conformsTo":"http://www.idpf.org/epub/linking/cfi/epub-cfi.html","value":"epubcfi(/6/18!/4/68/2,/3:189,/3:491)"}'}

I see the logic of CFI as documented here: https://idpf.org/epub/linking/cfi/epub-cfi.html. I just wasn't able to figure out how to get to these from my KOReader positions. I have also experimented with loading the EPUB file itself, map its table of content on individual chapters etc. But I still don't see how to get it to the shape I see in Zotero, since I don't get how to interpret "18!" as a chapter reference in my EPUB etc.

Any hints?

AbeJellinek · October 23, 2024

(removed outdated info)

dstillman · November 27, 2024

We've added support for importing Calibre and KOReader EPUB annotations in Zotero 7.0.10.

ks8997 · November 27, 2024

Nice, you guys doing are a great job!

vojtech.kase · November 27, 2024

Excellent, thank you so much!

So far I tested the "Import Ebook Annotations..." functionality with three EPUB ebooks annotated in koreader.

It seems that as it stands, it currently works only with annotations made with koreader 2024.07 or later. In the case of annotations made with an older version of koreader, I had to edit the metadata.epub.lua file, by renaming the ["highlight"] tag to the ["annotations" ] tag (alternatively, by renaming ["bookmarks"] to ["annotations" ] and all ["notes"] tags to ["text"] tags). And after that it worked with these other two as well.

What about the same functionality for PDFs? (I know there are other ways to achieve that, but it would be nice to be able to do that this way as well...)

dstillman · November 27, 2024

What about the same functionality for PDFs? (I know there are other ways to achieve that, but it would be nice to be able to do that this way as well...)

@vojtech.kase: Not sure what you mean by that. The reader has always supported importing standard PDF annotations. (But please start a new thread for questions about PDFs.)