Available for beta testing: Read Aloud

dstillman · March 3, 2026

In the latest Zotero beta, we've added a major new feature to the reader: Read Aloud.

Read Aloud reads your documents to you in high-quality, natural-sounding voices. It works on PDFs, EPUBs, and webpage snapshots.

To start it reading, just click the headphones button in the reader toolbar.

As you're listening, you can skip forward or backward by paragraph or sentence (Option/Alt-click or Option/Alt-left/right for the latter), and you can start reading from a particular point by right-clicking and choosing Read Aloud from the menu.

An "Annotate Sentence" button — or H or U on your keyboard — will automatically highlight or underline the last sentence you heard (or the current sentence if you're more than a few seconds into it). After you create an annotation, a popup will show you the annotated sentence, and there are shortcut keys to quickly move, expand, or delete the new annotation.

Read Aloud requires an internet connection and a Zotero account for high-quality voices, which we're calling Zotero Voices. If you'd like to use Read Aloud offline, you can still use the text-to-speech voices available on your system, but the quality will be much worse.

We're offering two tiers of Zotero Voices: Standard and Premium.

Standard voices are generated on Zotero servers, and we're offering unlimited Standard minutes to Zotero Storage subscribers (including institutional subscribers), as well as 2 hours/month to free accounts. Standard voices are currently available for 8 major languages, but they don't support multilingual text — e.g., they can't read text in one language when set to another.

Premium voices are the highest-quality voices, processed by external text-to-speech providers. They'll make fewer mistakes and generally sound more realistic than Standard voices. They also support many more languages and can handle multilingual text (or just read your documents in a foreign accent, if you prefer!). We'll be offering a certain number of monthly Premium minutes (varying by the specific voice you choose) to individual Zotero Storage subscribers, as well as a small number to free and institutional accounts in order to try them out.

During the beta, you'll be able to request additional Premium Voice minutes for free in order to test them out and provide feedback. After we've had a chance to see some real-world usage, we'll provide more details on the monthly allocations and options for adding additional minutes going forward.

A few known issues we'll be improving:

We're currently including a large number of Premium voices, particularly for non-English locales. We'll be narrowing the list as we see which voices people prefer in which languages, so please don't become too attached to Zotero Premium Voice 32!
There can be a delay of a few seconds before it starts reading large PDFs. (This actually isn't text-to-speech time, just a local processing delay that we need to fix.)
It'll get much better at skipping headers, footers, footnote/endnote superscripts, etc.

Currently, Read Aloud is only available in the desktop app, but it'll be coming to the iOS app soon (and Android after that).

Please start new threads to report any problems you encounter.

We're really looking forward to seeing how people use this feature. Thanks for testing!

Imperial_Squid · March 3, 2026

Congrats on making this a proper internal feature!!

One feature from my TTS plugin ZoTTS, that you may want to consider adding was a "speak from here" function, useful for those who don't want to start from the beginning every time, and proved pretty popular.

Thanks for all your continued hard work!

dstillman · March 3, 2026

@Imperial_Squid: I mention that in my post — you can right-click anywhere and choose Read Aloud. But we may look to make that easier.

Imperial_Squid · March 3, 2026

Ah nice, you're well ahead of the curve then!

Congrats again! :D

pa511 · March 4, 2026

I am have a TTS pluging I developed that allows you to run kokoro TTS locally and generate voice. Have you considered adding that option as well, so people can self host fully?

foss- · March 4, 2026

Can't wait to test this on Android, ideally with html / epub support. It works well on desktop. Thanks - been wanting TTS since when I first started using Zotero many years ago. Fantastic feature.

FredH281 · March 7, 2026

Thank you very much for introducing this feature.

> Read Aloud requires an internet connection and a Zotero account for high-quality voices, which we're calling Zotero Voices. If you'd like to use Read Aloud offline, you can still use the text-to-speech voices available on your system, but the quality will be much worse.

On, for example, a Google Pixel, I imagine the voices will be of sufficient quality, but on Linux I don't know. Will it be possible to use a local API to select text-to-speech voices on a Linux system?

asmlibre · March 7, 2026

Are will this feature opt-in or opt-out?

dstillman · March 7, 2026

@FredH281: You can use local voices on Linux, but in our testing we were seeing literally hundreds of exposed "voices" on some systems, and they were mostly unusable. If you can configure a decent local voice, you could use it.

@asmlibre: Not sure what you mean by that. It's just a button in the toolbar. If you don't click it, you won't use it.

ryanwwest · March 9, 2026

Should we expect the unlimited tier storage to have unlimited premium voice time (or high enough that a single human user can consider it unlimited)?

Hopefully using your own TTS server on the LAN e.g. https://github.com/hexgrad/kokoro will also be supported (maybe our option there is to configure it to show up as a 'local voice').

Being able to identify each voice by at least gender and accent/dialect (e.g., British vs USA English) would be nicer than just numbers when users initially pick a voice from the text list.

AbeJellinek · March 9, 2026

@ryanwwest, re dialect/accent: voices are supposed to be organized by region, but that behavior was accidentally removed right before the beta release. Really sorry about that! We'll have a fix in the next beta.

dstillman · March 9, 2026

@ryanwwest: Standard minutes will be unlimited. We don't yet know what the Premium limits will be, but these have real per-second costs from external providers, so we're not able to offer unlimited Premium minutes.

But again, you can request additional Premium minutes during the beta, so we encourage people to try them out and provide feedback.

dstillman · March 9, 2026

Voice grouping by region (accent) is fixed now in the latest beta.

asmlibre · March 10, 2026

@dstillman I worry about the implications of this feature on our privacy, because it depends on third-party servers.

I think it would be better if it was disabled by default and we could enable it optionally.

dstillman · March 10, 2026

@asmlibre: You have to enable it. You literally cannot use the feature unless you click the Read Aloud button in the toolbar and choose a voice in the first-run dialog. It won't use a Premium voice unless you specifically choose one. There is no way for you to do this accidentally.

(There's also no identifying information sent with Premium voice requests to begin with — just individual sentences from the document — so the privacy implications are minimal.)

ryanwwest · March 14, 2026

EPUB3+ files support Media Overlays that allow per-sentence audio clips to be played in sync with the book text and stored directly in the file, essentially combining ebook with audiobook (https://kb.daisy.org/publishing/docs/sync-media/overlays.html). Sometimes these are called 'readaloud EPUBs' and are similar to this new Zotero feature. While these EPUBs are currently readable by Zotero, their audio is ignored.

Is there any chance that EPUB Media Overlays will eventually be supported by Zotero? No internet or complex processing would be required to play the saved audio clips, and it might reuse much of the same framework being built here.

Related: https://gitlab.com/storyteller-platform/storyteller

ryanwwest · 2026-03-18T17:43:13+00:00

Some feedback on testing usage:
- Maximum 2x speed is not enough for me with English USA Standard Voice 2. Very minor nitpick, but when adjusting the speed when actively playing, it would be nicer to not restart the current sentence but just continue at the new speed.
- Sending the 'next song' or 'previous song' commands from Bluetooth headphones (double or triple click commonly) moves the audio forward or backward one paragraph which is a sensible default. But it might be nice to be able to adjust this behavior to be e.g. 5 paragraphs per forward skip and 2 pages per backward skip.
- I can definitely tell several differences between the standard and premium voices. Intonation is better on the premium ones. The one thing I wish could be improved for the standard voices is that currently they all sound a little more fuzzy or like there is the tiniest bit of background noise. This may just come down to a lower audio bitrate and not be feasible to increase, but a little more clarity there would be nice. Premium voices have no such issue.
- I can't find any Read Aloud Settings available outside the now playing dialog box itself, but assume one will be added once there are necessary settings.
- It might be nice to have an option for per-sentence highlighting of now-playing audio rather than just per-paragraph, especially for large paragraphs.

dstillman · 2026-03-18T17:45:29+00:00

it would be nicer to not restart the current sentence but just continue at the new speed

@ryanwwest: This will be fixed in the next beta.

carik · 2026-03-19T21:21:43+00:00

I am absolutely thrilled you are working on this. I learn best while pacing or walking and have been relying on Speechify, though I encounter both ethical and technical challenges with it, particularly around transferring annotations. So everything integrated into Zotero is very exciting to imagine.

Notes on current functioning:
- The one click "Annotate sentence" option is such a valuable feature. I can see using it regularly. But I notice it isn't functioning consistently for me across column breaks. I imagine this is highly related to the OCR of any given document. Is it fixable?
- Similarly, reading aloud of headers, footers, and metadata (such as the author info on the first page of most journal articles) is annoying and I'm guessing a complex problem to address.

Other highly desired features:

1. The ability to "read from here."
This is a feature I use often in the online and mobile Speechify apps, to jump ahead as well as to jump back. I'm so used to it that without it here, I feel frustratingly powerless.

2. Real time highlighting of current sentence, not current paragraph.

3. In an iOS version, I would love to be able to use it with the text only view. I would love to have annotation capability from within this view as well! (I don't think this exists at all currently, even without integrating read aloud.)

dstillman · 2026-03-20T16:10:34+00:00

@carik:

1. The ability to "read from here."

You can right-click anywhere and select Read Aloud from the menu, but we're going to try to provide a more discoverable way to do this.

2. Real time highlighting of current sentence, not current paragraph.

We'll likely at least make this an option. We show highlighting on the paragraph because that's what actually gets skipped when you use the skip-forward/back buttons, and because the annotate-sentence feature doesn't map exactly to the current sentence (there's a brief grace period where it will still highlight the previous sentence), but I think some people may prefer sentence highlighting regardless.

3. In an iOS version, I would love to be able to use it with the text only view

This will be possible in a future version.

danielborek · 2026-03-21T12:47:41+00:00

The TTS option is great! Can be possible in further releases to add an option to skip references in the text (if possible to extract them)?

eggodwin · 2026-03-22T18:38:01+00:00

Gave this a try the other day when I was cooking dinner and needed to do some reading for class. I love that I can dial the speed up and I definitely hear a difference between the Premium and regular voices.

Three "squawks" for you so far:
1) Read aloud has the same problem differentiating "body text" from "not body text" that highlighting does (which is the bane of my existence when I'm trying to highlight across a page break with header/footer info!) - it stopped mid-sentence to start reading me the information in a table. I would want tables and footers to be viewed as separate from the text.
2) It reads the in-text parenthetical citations in full. I'd love for there to be an option to skip those, but I realize that may be difficult to code, as I would want it to read in-text parenthetical phrases. But maybe it's doable if the citations are coded as a field?
3) It was doing a weird thing reading years, pronouncing 2024 "twenty two thousand four."

dstillman · 2026-03-22T18:51:42+00:00

@eggodwin: Better differentiation between body text and headers/footers should be coming soon, and we're going to try to address in-text citations as well.

For the year thing, if you have a specific example, can you provide a link, as well as the voice it happens with?

yakjones · 2026-03-26T02:16:53+00:00

How do I use this feature? I have Zotero installed, but I don't see the headphones icon in the reader toolbar.

dstillman · 2026-03-26T02:17:03+00:00

@yakjones: It's in the Zotero beta.

jaycliftmann · 2026-03-28T14:14:58+00:00

Hi, I'm very excited about this. Do you have plans to implement math reading software using something like MathCAT: https://daisy.github.io/MathCAT/

Many of my documents are impossible to understand without the equations, meaning any TTS system won't be much help without it.

eggodwin · 2026-03-30T03:49:51+00:00

@dstillman I ran out of minutes so I can't replicate it just yet (I see that beta testers can get more minutes for free but I'm not sure how to do that). It was Premium Voice 3.

dstillman · 2026-04-02T23:54:43+00:00

In the latest beta, after you've activated Read Aloud, you can now more easily start reading at a specific location by hovering over a paragraph and then clicking anywhere around the headphone icon in the margin. (You can still right-click somewhere and choose "Read Aloud from Here" if you prefer.)

jaclynprouse · 2026-04-08T20:33:22+00:00

I am loving this feature so far! I am wondering how to go about requesting additional Premium Voice minutes, if possible.

haug · 2026-04-13T20:15:49+00:00

I am also curious whether supporting a local TTS server (like Kokoro) or other OpenAI compatible APIs is on the roadmap.

f.murphey1499 · 2026-04-14T00:20:22+00:00

Loving this feature so far! Looking forward to when it will be able to skip footnotes and headers and such too!

One issue I have noticed is that it is having some trouble with numbers, it keeps reading 2020 as 2002? But it was reading all the other years fine as far as I could tell. It also read 3,000 as three zero zero zero. I think that might be because in the article it was "3,000-capacity center" so the hyphen may have thrown it off. This was while I was using the Premium 1 voice if that makes any difference.

Keep up the great work!