Available for beta testing: Read Aloud

dstillman Zotero Team
edited 21 days ago
In the latest Zotero beta, we've added a major new feature to the reader: Read Aloud.

Read Aloud reads your documents to you in high-quality, natural-sounding voices. It works on PDFs, EPUBs, and webpage snapshots.

To start it reading, just click the headphones button in the reader toolbar.

As you're listening, you can skip forward or backward by paragraph or sentence (Option/Alt-click or Option/Alt-left/right for the latter), and you can start reading from a particular point by right-clicking and choosing Read Aloud from the menu.

An "Annotate Sentence" button — or H or U on your keyboard — will automatically highlight or underline the last sentence you heard (or the current sentence if you're more than a few seconds into it). After you create an annotation, a popup will show you the annotated sentence, and there are shortcut keys to quickly move, expand, or delete the new annotation.

Read Aloud requires an internet connection and a Zotero account for high-quality voices, which we're calling Zotero Voices. If you'd like to use Read Aloud offline, you can still use the text-to-speech voices available on your system, but the quality will be much worse.

We're offering two tiers of Zotero Voices: Standard and Premium.

Standard voices are generated on Zotero servers, and we're offering unlimited Standard minutes to Zotero Storage subscribers (including institutional subscribers), as well as 2 hours/month to free accounts. Standard voices are currently available for 8 major languages, but they don't support multilingual text — e.g., they can't read text in one language when set to another.

Premium voices are the highest-quality voices, processed by external text-to-speech providers. They'll make fewer mistakes and generally sound more realistic than Standard voices. They also support many more languages and can handle multilingual text (or just read your documents in a foreign accent, if you prefer!). We'll be offering a certain number of monthly Premium minutes (varying by the specific voice you choose) to individual Zotero Storage subscribers, as well as a small number to free and institutional accounts in order to try them out.

During the beta, you'll be able to request additional Premium Voice minutes for free in order to test them out and provide feedback. After we've had a chance to see some real-world usage, we'll provide more details on the monthly allocations and options for adding additional minutes going forward.

A few known issues we'll be improving:
  • We're currently including a large number of Premium voices, particularly for non-English locales. We'll be narrowing the list as we see which voices people prefer in which languages, so please don't become too attached to Zotero Premium Voice 32!
  • There can be a delay of a few seconds before it starts reading large PDFs. (This actually isn't text-to-speech time, just a local processing delay that we need to fix.)
  • It'll get much better at skipping headers, footers, footnote/endnote superscripts, etc.
Currently, Read Aloud is only available in the desktop app, but it'll be coming to the iOS app soon (and Android after that).

Please start new threads to report any problems you encounter.

We're really looking forward to seeing how people use this feature. Thanks for testing!
  • Congrats on making this a proper internal feature!!

    One feature from my TTS plugin ZoTTS, that you may want to consider adding was a "speak from here" function, useful for those who don't want to start from the beginning every time, and proved pretty popular.

    Thanks for all your continued hard work!
  • dstillman Zotero Team
    @Imperial_Squid: I mention that in my post — you can right-click anywhere and choose Read Aloud. But we may look to make that easier.
  • Ah nice, you're well ahead of the curve then!

    Congrats again! :D
  • I am have a TTS pluging I developed that allows you to run kokoro TTS locally and generate voice. Have you considered adding that option as well, so people can self host fully?
  • Can't wait to test this on Android, ideally with html / epub support. It works well on desktop. Thanks - been wanting TTS since when I first started using Zotero many years ago. Fantastic feature.
  • Thank you very much for introducing this feature.

    > Read Aloud requires an internet connection and a Zotero account for high-quality voices, which we're calling Zotero Voices. If you'd like to use Read Aloud offline, you can still use the text-to-speech voices available on your system, but the quality will be much worse.

    On, for example, a Google Pixel, I imagine the voices will be of sufficient quality, but on Linux I don't know. Will it be possible to use a local API to select text-to-speech voices on a Linux system?
  • Are will this feature opt-in or opt-out?
  • dstillman Zotero Team
    @FredH281: You can use local voices on Linux, but in our testing we were seeing literally hundreds of exposed "voices" on some systems, and they were mostly unusable. If you can configure a decent local voice, you could use it.

    @asmlibre: Not sure what you mean by that. It's just a button in the toolbar. If you don't click it, you won't use it.
  • edited 15 days ago
    Should we expect the unlimited tier storage to have unlimited premium voice time (or high enough that a single human user can consider it unlimited)?

    Hopefully using your own TTS server on the LAN e.g. https://github.com/hexgrad/kokoro will also be supported (maybe our option there is to configure it to show up as a 'local voice').

    Being able to identify each voice by at least gender and accent/dialect (e.g., British vs USA English) would be nicer than just numbers when users initially pick a voice from the text list.
  • AbeJellinek Zotero Team
    @ryanwwest, re dialect/accent: voices are supposed to be organized by region, but that behavior was accidentally removed right before the beta release. Really sorry about that! We'll have a fix in the next beta.
  • dstillman Zotero Team
    @ryanwwest: Standard minutes will be unlimited. We don't yet know what the Premium limits will be, but these have real per-second costs from external providers, so we're not able to offer unlimited Premium minutes.

    But again, you can request additional Premium minutes during the beta, so we encourage people to try them out and provide feedback.
  • dstillman Zotero Team
    Voice grouping by region (accent) is fixed now in the latest beta.
  • edited 14 days ago
    @dstillman I worry about the implications of this feature on our privacy, because it depends on third-party servers.

    I think it would be better if it was disabled by default and we could enable it optionally.
  • dstillman Zotero Team
    edited 14 days ago
    @asmlibre: You have to enable it. You literally cannot use the feature unless you click the Read Aloud button in the toolbar and choose a voice in the first-run dialog. It won't use a Premium voice unless you specifically choose one. There is no way for you to do this accidentally.

    (There's also no identifying information sent with Premium voice requests to begin with — just individual sentences from the document — so the privacy implications are minimal.)
  • edited 10 days ago
    EPUB3+ files support Media Overlays that allow per-sentence audio clips to be played in sync with the book text and stored directly in the file, essentially combining ebook with audiobook (https://kb.daisy.org/publishing/docs/sync-media/overlays.html). Sometimes these are called 'readaloud EPUBs' and are similar to this new Zotero feature. While these EPUBs are currently readable by Zotero, their audio is ignored.

    Is there any chance that EPUB Media Overlays will eventually be supported by Zotero? No internet or complex processing would be required to play the saved audio clips, and it might reuse much of the same framework being built here.

    Related: https://gitlab.com/storyteller-platform/storyteller
  • edited 6 days ago
    Some feedback on testing usage:
    - Maximum 2x speed is not enough for me with English USA Standard Voice 2. Very minor nitpick, but when adjusting the speed when actively playing, it would be nicer to not restart the current sentence but just continue at the new speed.
    - Sending the 'next song' or 'previous song' commands from Bluetooth headphones (double or triple click commonly) moves the audio forward or backward one paragraph which is a sensible default. But it might be nice to be able to adjust this behavior to be e.g. 5 paragraphs per forward skip and 2 pages per backward skip.
    - I can definitely tell several differences between the standard and premium voices. Intonation is better on the premium ones. The one thing I wish could be improved for the standard voices is that currently they all sound a little more fuzzy or like there is the tiniest bit of background noise. This may just come down to a lower audio bitrate and not be feasible to increase, but a little more clarity there would be nice. Premium voices have no such issue.
    - I can't find any Read Aloud Settings available outside the now playing dialog box itself, but assume one will be added once there are necessary settings.
    - It might be nice to have an option for per-sentence highlighting of now-playing audio rather than just per-paragraph, especially for large paragraphs.
  • dstillman Zotero Team
    it would be nicer to not restart the current sentence but just continue at the new speed
    @ryanwwest: This will be fixed in the next beta.
  • I am absolutely thrilled you are working on this. I learn best while pacing or walking and have been relying on Speechify, though I encounter both ethical and technical challenges with it, particularly around transferring annotations. So everything integrated into Zotero is very exciting to imagine.

    Notes on current functioning:
    - The one click "Annotate sentence" option is such a valuable feature. I can see using it regularly. But I notice it isn't functioning consistently for me across column breaks. I imagine this is highly related to the OCR of any given document. Is it fixable?
    - Similarly, reading aloud of headers, footers, and metadata (such as the author info on the first page of most journal articles) is annoying and I'm guessing a complex problem to address.

    Other highly desired features:

    1. The ability to "read from here."
    This is a feature I use often in the online and mobile Speechify apps, to jump ahead as well as to jump back. I'm so used to it that without it here, I feel frustratingly powerless.

    2. Real time highlighting of current sentence, not current paragraph.

    3. In an iOS version, I would love to be able to use it with the text only view. I would love to have annotation capability from within this view as well! (I don't think this exists at all currently, even without integrating read aloud.)
  • dstillman Zotero Team
    @carik:
    1. The ability to "read from here."
    You can right-click anywhere and select Read Aloud from the menu, but we're going to try to provide a more discoverable way to do this.
    2. Real time highlighting of current sentence, not current paragraph.
    We'll likely at least make this an option. We show highlighting on the paragraph because that's what actually gets skipped when you use the skip-forward/back buttons, and because the annotate-sentence feature doesn't map exactly to the current sentence (there's a brief grace period where it will still highlight the previous sentence), but I think some people may prefer sentence highlighting regardless.
    3. In an iOS version, I would love to be able to use it with the text only view
    This will be possible in a future version.
  • The TTS option is great! Can be possible in further releases to add an option to skip references in the text (if possible to extract them)?
  • Gave this a try the other day when I was cooking dinner and needed to do some reading for class. I love that I can dial the speed up and I definitely hear a difference between the Premium and regular voices.

    Three "squawks" for you so far:
    1) Read aloud has the same problem differentiating "body text" from "not body text" that highlighting does (which is the bane of my existence when I'm trying to highlight across a page break with header/footer info!) - it stopped mid-sentence to start reading me the information in a table. I would want tables and footers to be viewed as separate from the text.
    2) It reads the in-text parenthetical citations in full. I'd love for there to be an option to skip those, but I realize that may be difficult to code, as I would want it to read in-text parenthetical phrases. But maybe it's doable if the citations are coded as a field?
    3) It was doing a weird thing reading years, pronouncing 2024 "twenty two thousand four."
  • dstillman Zotero Team
    @eggodwin: Better differentiation between body text and headers/footers should be coming soon, and we're going to try to address in-text citations as well.

    For the year thing, if you have a specific example, can you provide a link, as well as the voice it happens with?
Sign In or Register to comment.