Embedding annotations in EPUB file itself
dstillman
Zotero Team
This discussion was created from comments split from: Announcing the Zotero 7 Beta.
Upgrade Storage
I am curious about one thing though. Is it possible to embed EPUB annotations in the file itself, similar to how PDF annotations work? I have noticed that no other reader seems to be able to do this, so is this a fundamental limitation of the format itself?
Thank you for clarifying. Would it be possible to export the EPUB in such a way that if re-imported back to Zotero the annotations show up in file? (Of course I don't expect this to work with other readers).
Apple Books does something similar. If you re-import a deleted book, the old annotations show up in the document. Although I understand that's not exactly the same thing as what I have in mind since they seem to be storing the annotations in their database and then "re-applying" it. I hope this makes sense.
Imagine this, we have a book that we sometimes read on our office laptop, sometimes on a tablet, and sometimes on a computer at home. But after finishing the book, we want to gather all the notes (for example, the highlight ) we made while reading in different situations and keep the highlights associated with their original positions in the book. How can we do this conveniently?
I have been searching for a solution that achieves the following goals:
1. Associate the highlighted portions of the book with their original positions.
2. Seamlessly synchronize reading progress, including the highlights, across devices.
3. Export and import the highlights while maintaining their association with original positions in the book.
PDF files can export highlights in .xfdf format or import .xfdf files as highlights, allow for appending or consolidating notes and highlight, which is a great feature.
I used to read epub books with Calibre, a application like zotero, but it doesn't have an Android app. Later, I started using zotero to read epub books. Recently, zotero's web version introduced an online reading feature that allows for highlighting and note-taking, which is very convenient.
However, I encountered an issue with zotero. When reading PDFs, the highlights are stored in the zotero database, and I can also choose to export the PDF with the highlights. **But this functionality doesn't exist for epub books.** So, if an epub book is in a zotero shared group, I can't move it with the highlights to My Library. However, if the book is in PDF format, I can export it from the zotero shared library and import it into My Library, consolidating highlights from multiple identical PDF files into one PDF.
It seems that zotero's design principle is to minimize changes to the content of PDF and epub files. But considering that epub is essentially a compressed format, I think it would be possible to allow users to choose to export the highlights along with the epub files, maybe in a proprietary zotero format compressed into the epub file. When re-imported into zotero, it can import these highlights. In this regard, I think it might be worth considering the design approach of Calibre, and even maintaining compatibility to facilitate switching between the two software.
In summary, I hope for a workflow that allows users to consolidate highlights from multiple identical epubs into one epub file, just like I do in PDF files.
BUT, if I want to send an EPUB with annotations to a Zotero-using friend without using a group library, what should I do?
Perhaps embedded annotations for epub is a better solution.
I am not sure whether the following is an standard for exporting annotations with EPUB files,
https://en.wikipedia.org/wiki/EPUB#:~:text=The OPF file, traditionally named content.opf, houses the EPUB book's metadata, file manifest, and linear reading order.
But I recommend you try using the software Calibre, which stores highlights and other notes from EPUB files in a central library (I guess). Additionally, it also saves a copy of these highlights in the "metadata.opf" file located in the same directory as the EPUB file.
Perhaps Zotero can consider implementing a similar approach. Or you and the developers of Calibre work together to develop an universal and extensible ePub annotation standard.
many thanks.
1. Zotero ePub format: A hypothetical ePub format with support for embedded annotations. However, I appreciate the fact that the lack of an existing standard makes things difficult for you.
2. Zotero ePub reader: A lightweight, stand-alone clone of the Zotero app's ePub viewer.
Requesting a stand-alone ePub reader seems like too big of an ask for the development team, but creating a standalone format leaves open the possibility for some enterprising member of the community to eventually create it. It seems to me that that would be a reasonably straightforward task, but it is a completely uninformed opinion since I don't know much about programming or Zotero internals.
The advantage of this system should be easy to see. We would be able to share ePub files with others without using group libraries or requiring others to install the full Zotero application. And in case I decide to stop using Zotero, I can still access these in-line annotations just using the stand-alone Zotero ePub reader.
It would also be beneficial to my existing workflow of annotating textbooks. Once I am done annotating them, I'd rather not have them cluttering my Zotero library and would prefer to export them out (especially since some of them can have sizes in the hundreds of MBs), and re-import them back when needed.
metadata.opfalongside the EPUB.I'm seeing a couple issues:
- Calibre's viewer doesn't seem to read annotations from a
- This is 100% Calibre's own thing - the annotations are stored as JSON under a meta tag named
But this is still the closest thing I've seen to a portable EPUB annotation storage format that we could use.metadata.opffile placed alongside an EPUB that isn't in my library, and it doesn't import the annotations inmetadata.opfwhen I add the EPUB to my librarycalibre:annotationI don't know if Zotero evaluated that and if so why they decided against it, but if you want loss-less conversion to the PDF annotation standard (that one an ISO standard and much more widely used for PDFs, of course) that might be one reason.
AFAIK Hypothesis has never made its EPUB annotation system available through its browser extension - I could be wrong about that, but all I've seen are the demos. Its annotations are also not actually compliant with the WADM standard; e.g. here's one:
{
// ...
"target": [
{
"source": "http://hypothesis.evidentpoint.com/readium-demo/?epub=epub_content%2Falice&goto=%7B%22idref%22%3A%22chapter_001%22%2C%22elementCfi%22%3A%22%2F4%2F2%2F2%5Bpgepubid00004%5D%2F2%5BI_DOWN_THE_RABBIT-HOLE%5D%22%7D&",
"selector": [
{
"type": "RangeSelector",
"endOffset": 155,
"startOffset": 137,
"endContainer": "/section[1]/p[3]",
"startContainer": "/section[1]/p[3]"
},
{
"end": 827,
"type": "TextPositionSelector",
"start": 809
},
{
"type": "TextQuoteSelector",
"exact": "Oh dear! Oh dear! ",
"prefix": "ar the Rabbit\n\t\tsay to itself, \"",
"suffix": "I shall be too late!\" But when t"
}
]
}
]
}
- The EPUB CFI - the most important piece of information for an EPUB annotation - is embedded within the query string of the URL (`&goto={"idref":"chapter_001","elementCfi":"/4/2/2[pgepubid00004]/2[I_DOWN_THE_RABBIT-HOLE]"}&`). It should be in a WADM FragmentSelector. Also, the URL 404s, and URLs from more recent annotations return a 502 ("It looks like the dev server has not been started yet").
- the RangeSelector is invalid: it should have a startSelector and an endSelector (both WADM selectors of their own), but instead it's essentially the fields of a DOM Range object dumped to JSON.
- All the data that I elided before "target" is Hypothesis-specific metadata (users, permissions, tags).
So we could probably allow importing annotations exported from Hypothesis, but that might be of questionable value considering that they haven't made their EPUB annotator directly available to end users. I do not believe that we could easily create data that Hypothesis would be able to import (and we definitely don't want to use its export format as our main export format).I'll take a look at what obsidian-annotator does and whether we might be able to interoperate with it, or at least make it easy for an Obsidian-Zotero integration plugin to do that.
But since they already support syncing with a few online services, and Zotero's EPUB (and snapshot) annotations are exposed with WADM selectors through the API, there's no reason someone couldn't write an integration on the KOReader side.
So yes, a new KOReader integration with any WADM source including Zotero would be ideal (if Zotero also supports some sort of generic WADM import/export from any other app supporting WADM - I'm not sure if this is feasible or sensible), and a Koreader <-> Zotero direct integration via Zotero's SQL table for annotations could be an alternative, less universal option (and I also don't believe writing the SQL table is safe in Zotero).
Zotero for Android seems to require at least Android 6 https://forums.zotero.org/discussion/comment/451631/#Comment_451631 The fairly recent e-reader I have (Tolino Page 2) only has version 4.
I would be interested to help make a Zotero plugin for importing EPUB annotations made by KOReader. I imagine something that mimics the "File → Import Annotations..." feature currently present for PDFs, using the `metadata.epub.lua` metadata file in KOReader sidecar folder.
I would appreciate some advice:
1. is such a plugin generally feasible? (I skimmed https://www.zotero.org/support/dev/client_coding/javascript_api#file_io already)
2. is there enough data stored in KOReader `metadata.epub.lua` files to generate WADM-compliant annotations?
3. is there an existing plugin which would be a good reference point to start from?
4. any other tools/libraries that would help development? (e.g. Lua data handling, WADM schema testing, etc)
One challenge may be translating the coordinate point system between Zotero and KOReader - last I tested, I think one had 0,0 be in top-left corner and the other in bottom-left, and the ratio was different. But that can be solved.
Zotero documentation says somewhere that you shouldn't edit the database directly which might cause corruption. My plan has been to sync with the Zotero Web API which I think can read/write annotations. This of course requires having an account but seemed simpler to me as an initial step, then syncs Zotero annotations between devices too.
This might be helpful: https://github.com/stelzch/zotero.koplugin/issues/13
You might want to use the upcoming release of KOReader which finally adds color annotation support that is most similar to Zotero - this will change the annotation schema in metadata.epub.lua (and metadata.pdf.lua, etc.).
WADM-compliance in KOReader - I don't know enough about the standard though I think WADM is the way to go. There is a fair amount of metadata per annotation stored.
I'm happy to DM with you if you want to discuss more (https://www.reddit.com/user/ryanwwest/), or new discussion here - I don't have time to develop right now but know a fair bit about KOReader and this specific issue.
I've already figured how to transfer my PDF annotations this way (the most tricky part was how to translate the positions, but it finally works). I will share my script once I figure out the solution for EPUBs.
Now I want to achieve the same with EPUBs. The tricky part are again the positions. In case of koreader, the annotation position is encoded as here:
```
{'chapter': 'preface',
'page': '/body/DocFragment[9]/body/p[25]/span/text()[2].189',
'pos0': '/body/DocFragment[9]/body/p[25]/span/text()[2].189',
'pos1': '/body/DocFragment[9]/body/p[25]/span/text()[2].491',
'text': 'One poem by itself was certainly not responsible for an entire intellectual, moral, and social transformation—no single work was, let alone one that for centuries could not without danger be spoken about freely in public. But this particular ancient book, suddenly returning to view, made a difference.'}
When I manually do the same annotation in Zotero reader and extract it via the API (as children of my attachment), its position is expressed like this (I select only a few relevant attributes):
{ 'annotationText': 'One poem by itself was certainly not responsible for an entire intellectual, moral, and social transformation—no single work was, let alone one that for centuries could not without danger be spoken about freely in public. But this particular ancient book, suddenly returning to view, made a difference.',
'annotationComment': '',
'annotationColor': '#ffd400',
'annotationSortIndex': '00008|00019912',
'annotationPosition': '{"type":"FragmentSelector","conformsTo":"http://www.idpf.org/epub/linking/cfi/epub-cfi.html","value":"epubcfi(/6/18!/4/68/2,/3:189,/3:491)"}'}
I see the logic of CFI as documented here: https://idpf.org/epub/linking/cfi/epub-cfi.html. I just wasn't able to figure out how to get to these from my KOReader positions. I have also experimented with loading the EPUB file itself, map its table of content on individual chapters etc. But I still don't see how to get it to the shape I see in Zotero, since I don't get how to interpret "18!" as a chapter reference in my EPUB etc.
Any hints?
So far I tested the "Import Ebook Annotations..." functionality with three EPUB ebooks annotated in koreader.
It seems that as it stands, it currently works only with annotations made with koreader 2024.07 or later. In the case of annotations made with an older version of koreader, I had to edit the metadata.epub.lua file, by renaming the ["highlight"] tag to the ["annotations" ] tag (alternatively, by renaming ["bookmarks"] to ["annotations" ] and all ["notes"] tags to ["text"] tags). And after that it worked with these other two as well.
What about the same functionality for PDFs? (I know there are other ways to achieve that, but it would be nice to be able to do that this way as well...)