Available for beta testing: Read Aloud
dstillman
Zotero Team
In the latest Zotero beta, we've added a major new feature to the reader: Read Aloud.
Read Aloud reads your documents to you in high-quality, natural-sounding voices. It works on PDFs, EPUBs, and webpage snapshots.
To start it reading, just click the headphones button in the reader toolbar.
As you're listening, you can skip forward or backward by paragraph or sentence (Option/Alt-click or Option/Alt-left/right for the latter), and you can start reading from a particular point by right-clicking and choosing Read Aloud from the menu.
An "Annotate Sentence" button — or H or U on your keyboard — will automatically highlight or underline the last sentence you heard (or the current sentence if you're more than a few seconds into it). After you create an annotation, a popup will show you the annotated sentence, and there are shortcut keys to quickly move, expand, or delete the new annotation.
Read Aloud requires an internet connection and a Zotero account for high-quality voices, which we're calling Zotero Voices. If you'd like to use Read Aloud offline, you can still use the text-to-speech voices available on your system, but the quality will be much worse.
We're offering two tiers of Zotero Voices: Standard and Premium.
Standard voices are generated on Zotero servers, and we're offering unlimited Standard minutes to Zotero Storage subscribers (including institutional subscribers), as well as 2 hours/month to free accounts. Standard voices are currently available for 8 major languages, but they don't support multilingual text — e.g., they can't read text in one language when set to another.
Premium voices are the highest-quality voices, processed by external text-to-speech providers. They'll make fewer mistakes and generally sound more realistic than Standard voices. They also support many more languages and can handle multilingual text (or just read your documents in a foreign accent, if you prefer!). We'll be offering a certain number of monthly Premium minutes (varying by the specific voice you choose) to individual Zotero Storage subscribers, as well as a small number to free and institutional accounts in order to try them out.
During the beta, you'll be able to request additional Premium Voice minutes for free in order to test them out and provide feedback. After we've had a chance to see some real-world usage, we'll provide more details on the monthly allocations and options for adding additional minutes going forward.
A few known issues we'll be improving:
Please start new threads to report any problems you encounter.
We're really looking forward to seeing how people use this feature. Thanks for testing!
Read Aloud reads your documents to you in high-quality, natural-sounding voices. It works on PDFs, EPUBs, and webpage snapshots.
To start it reading, just click the headphones button in the reader toolbar.
As you're listening, you can skip forward or backward by paragraph or sentence (Option/Alt-click or Option/Alt-left/right for the latter), and you can start reading from a particular point by right-clicking and choosing Read Aloud from the menu.
An "Annotate Sentence" button — or H or U on your keyboard — will automatically highlight or underline the last sentence you heard (or the current sentence if you're more than a few seconds into it). After you create an annotation, a popup will show you the annotated sentence, and there are shortcut keys to quickly move, expand, or delete the new annotation.
Read Aloud requires an internet connection and a Zotero account for high-quality voices, which we're calling Zotero Voices. If you'd like to use Read Aloud offline, you can still use the text-to-speech voices available on your system, but the quality will be much worse.
We're offering two tiers of Zotero Voices: Standard and Premium.
Standard voices are generated on Zotero servers, and we're offering unlimited Standard minutes to Zotero Storage subscribers (including institutional subscribers), as well as 2 hours/month to free accounts. Standard voices are currently available for 8 major languages, but they don't support multilingual text — e.g., they can't read text in one language when set to another.
Premium voices are the highest-quality voices, processed by external text-to-speech providers. They'll make fewer mistakes and generally sound more realistic than Standard voices. They also support many more languages and can handle multilingual text (or just read your documents in a foreign accent, if you prefer!). We'll be offering a certain number of monthly Premium minutes (varying by the specific voice you choose) to individual Zotero Storage subscribers, as well as a small number to free and institutional accounts in order to try them out.
During the beta, you'll be able to request additional Premium Voice minutes for free in order to test them out and provide feedback. After we've had a chance to see some real-world usage, we'll provide more details on the monthly allocations and options for adding additional minutes going forward.
A few known issues we'll be improving:
- We're currently including a large number of Premium voices, particularly for non-English locales. We'll be narrowing the list as we see which voices people prefer in which languages, so please don't become too attached to Zotero Premium Voice 32!
- There can be a delay of a few seconds before it starts reading large PDFs. (This actually isn't text-to-speech time, just a local processing delay that we need to fix.)
- It'll get much better at skipping headers, footers, footnote/endnote superscripts, etc.
Please start new threads to report any problems you encounter.
We're really looking forward to seeing how people use this feature. Thanks for testing!
Upgrade Storage
One feature from my TTS plugin ZoTTS, that you may want to consider adding was a "speak from here" function, useful for those who don't want to start from the beginning every time, and proved pretty popular.
Thanks for all your continued hard work!
Congrats again! :D
> Read Aloud requires an internet connection and a Zotero account for high-quality voices, which we're calling Zotero Voices. If you'd like to use Read Aloud offline, you can still use the text-to-speech voices available on your system, but the quality will be much worse.
On, for example, a Google Pixel, I imagine the voices will be of sufficient quality, but on Linux I don't know. Will it be possible to use a local API to select text-to-speech voices on a Linux system?
@asmlibre: Not sure what you mean by that. It's just a button in the toolbar. If you don't click it, you won't use it.
Hopefully using your own TTS server on the LAN e.g. https://github.com/hexgrad/kokoro will also be supported (maybe our option there is to configure it to show up as a 'local voice').
Being able to identify each voice by at least gender and accent/dialect (e.g., British vs USA English) would be nicer than just numbers when users initially pick a voice from the text list.
But again, you can request additional Premium minutes during the beta, so we encourage people to try them out and provide feedback.
I think it would be better if it was disabled by default and we could enable it optionally.
(There's also no identifying information sent with Premium voice requests to begin with — just individual sentences from the document — so the privacy implications are minimal.)
Is there any chance that EPUB Media Overlays will eventually be supported by Zotero? No internet or complex processing would be required to play the saved audio clips, and it might reuse much of the same framework being built here.
Related: https://gitlab.com/storyteller-platform/storyteller
- Maximum 2x speed is not enough for me with English USA Standard Voice 2. Very minor nitpick, but when adjusting the speed when actively playing, it would be nicer to not restart the current sentence but just continue at the new speed.
- Sending the 'next song' or 'previous song' commands from Bluetooth headphones (double or triple click commonly) moves the audio forward or backward one paragraph which is a sensible default. But it might be nice to be able to adjust this behavior to be e.g. 5 paragraphs per forward skip and 2 pages per backward skip.
- I can definitely tell several differences between the standard and premium voices. Intonation is better on the premium ones. The one thing I wish could be improved for the standard voices is that currently they all sound a little more fuzzy or like there is the tiniest bit of background noise. This may just come down to a lower audio bitrate and not be feasible to increase, but a little more clarity there would be nice. Premium voices have no such issue.
- I can't find any Read Aloud Settings available outside the now playing dialog box itself, but assume one will be added once there are necessary settings.
- It might be nice to have an option for per-sentence highlighting of now-playing audio rather than just per-paragraph, especially for large paragraphs.
Notes on current functioning:
- The one click "Annotate sentence" option is such a valuable feature. I can see using it regularly. But I notice it isn't functioning consistently for me across column breaks. I imagine this is highly related to the OCR of any given document. Is it fixable?
- Similarly, reading aloud of headers, footers, and metadata (such as the author info on the first page of most journal articles) is annoying and I'm guessing a complex problem to address.
Other highly desired features:
1. The ability to "read from here."
This is a feature I use often in the online and mobile Speechify apps, to jump ahead as well as to jump back. I'm so used to it that without it here, I feel frustratingly powerless.
2. Real time highlighting of current sentence, not current paragraph.
3. In an iOS version, I would love to be able to use it with the text only view. I would love to have annotation capability from within this view as well! (I don't think this exists at all currently, even without integrating read aloud.)
Three "squawks" for you so far:
1) Read aloud has the same problem differentiating "body text" from "not body text" that highlighting does (which is the bane of my existence when I'm trying to highlight across a page break with header/footer info!) - it stopped mid-sentence to start reading me the information in a table. I would want tables and footers to be viewed as separate from the text.
2) It reads the in-text parenthetical citations in full. I'd love for there to be an option to skip those, but I realize that may be difficult to code, as I would want it to read in-text parenthetical phrases. But maybe it's doable if the citations are coded as a field?
3) It was doing a weird thing reading years, pronouncing 2024 "twenty two thousand four."
For the year thing, if you have a specific example, can you provide a link, as well as the voice it happens with?