[Read Aloud] Feedback and Suggestions

nyetus · 2026-03-13T14:02:33+00:00

First and foremost, many thanks for this feature. I cannot express how excited I am to see this supported natively.

A few remarks and suggestions that I hope may be useful:

1. The reading sometimes contains pauses or hiccups, especially with certain names, for example E. J. Lowe. (It's read as E *full stop* J *full stop* Lowe *full stop*.)
2. Words split across line-breaks are sometimes read incorrectly, for example metametaphy-sics.
3. The feature also sometimes reads watermarks in the margins or at the side of the page.
4. When playback is stopped and restarted, it begins again from the beginning. It would be helpful if it could resume from the previous position, or at least offer this as an optional default behaviour.
5. It is not currently possible to start Read Aloud directly from a highlighted line. In heavily annotated documents, this makes the feature somewhat cumbersome to access, since one first has to select text to reach the option through the context menu.
6. An autoscroll function would also improve usability, ideally with the text remaining centred on screen while playback continues. A toggle for this would be useful, similar to ElevenReader, where one can also scroll manually and recenter when desired. An option to disable this behaviour would, of course, also be helpful.
7. It would be useful to have a toggle for whether footnotes are read aloud. The default should probably be not to read them.
8. The highlighting during playback does not always cover the full sentence, which means that sometimes only part of the sentence is highlighted rather than the sentence as a whole.

I realise some of these may already be known issues. Apologies for any repetition, but I thought it would still be worth flagging them just in case.

dstillman · 2026-03-13T14:25:37+00:00

1. The reading sometimes contains pauses or hiccups, especially with certain names, for example E. J. Lowe. (It's read as E *full stop* J *full stop* Lowe *full stop*.)
2. Words split across line-breaks are sometimes read incorrectly, for example metametaphy-sics.

For the reading issues, you should say what voices you're trying with, and ideally provide specific examples. In general, the Standard voices will have more problems like these than the Premium voices. (The issue with name initials is known, though, and we should be able to improve that.)

3. The feature also sometimes reads watermarks in the margins or at the side of the page.

Yes, this should improve greatly in an upcoming version.

4. When playback is stopped and restarted, it begins again from the beginning. It would be helpful if it could resume from the previous position, or at least offer this as an optional default behaviour.

~~If I recall, there were some technical limitations with restarting from the previous position, but we'll see what we can do.~~ See below.

5. It is not currently possible to start Read Aloud directly from a highlighted line. In heavily annotated documents, this makes the feature somewhat cumbersome to access, since one first has to select text to reach the option through the context menu.

You mean because the Read Aloud option isn't shown in the context-menu option when you right-click on an annotation?

6. An autoscroll function would also improve usability, ideally with the text remaining centred on screen while playback continues.

It already should, but it looks like that may not currently be working for all PDFs. We'll investigate, but examples would be helpful.

7. It would be useful to have a toggle for whether footnotes are read aloud. The default should probably be not to read them.

Do you mean footnotes themselves, or footnote numbers within the text? (But both are covered by 3 above.)

8. The highlighting during playback does not always cover the full sentence, which means that sometimes only part of the sentence is highlighted rather than the sentence as a whole.

Again, we'd want specific examples, since we may be able to improve them, but sentence detection is always going to be a little dicey, and this is what the expand option for is in the popup detection.

Thanks for testing!

dstillman · 2026-03-13T15:05:51+00:00

4. When playback is stopped and restarted, it begins again from the beginning.

Wait, are you testing with local voices? Due to technical limitations, those always have to start from the beginning of a sentence, but Standard and Premium voices should start from the most recent silence — no more than a fraction of a sentence back.

Or do you mean restarting after closing the Read Aloud popup (or closing the document entirely)?

nyetus · 2026-03-13T15:29:18+00:00

Thanks for your response!

> For the reading issues, you should say what voices you're trying with, and ideally provide specific examples. In general, the Standard voices will have more problems like these than the Premium voices. (The issue with name initials is known, though, and we should be able to improve that.)

I was working with Premium Voice 1. (I prefer UK voices.)

For example, in the sentence: ‘The present Element is then primarily an exercise in metametaphy-sics, that is, the field of philosophy studying the nature of metaphysics: its subject matter, branches, method, concepts, epistemology, and semantics.’ (See screenshots below.)

Here, the hyphen reflects a line break in the document. The model seems to get confused by this and pronounces it as ‘metametafee…sics’.

> You mean because the Read Aloud option isn't shown in the context-menu option when you right-click on an annotation?

Yes, there is a workaround: clicking either on the side of the document or somewhere in the white space between annotations. It is not a major issue, but it might still be worth considering adding the option ‘Read Aloud’ to the context menu for a highlight, as that would eliminate the problem entirely.

> Do you mean footnotes themselves, or footnote numbers within the text? (But both are covered by 3 above.)

I was thinking primarily about footnotes. Most of the time, I do not want them to be read aloud, but sometimes I do want to hear them when I am reading a text carefully. If they are skipped automatically, that means I would have to pause the model to read the footnote myself and then restart it afterwards. That said, when footnotes are not read aloud, it is actually useful to have the footnote numbers spoken, since that gives me a cue to pause and read the footnote myself.

> Again, we'd want specific examples, since we may be able to improve them, but sentence detection is always going to be a little dicey, and this is what the expand option for is in the popup detection.

An example:

https://s3.amazonaws.com/zotero.org/images/forums/u5387224/7obwh0kun7rb1408qvds.png

https://s3.amazonaws.com/zotero.org/images/forums/u5387224/fxtp3hqg6w5pc4yf74wu.png

In the first screenshot, a fairly large chunk of text, spanning multiple sentences, is selected as the current reading segment. (I must admit this can be inconvenient if you want to start from a particular sentence.) That being said, sentence-level highlighting still works reasonably well until the final sentence, after which the next highlighted segment becomes very small. Moreover, because the final sentence is split into three reading segments*, highlighting it would require using the highlight function three times to highlight one sentence.

I also noticed another just now:

9. When you change the playback speed, the output starts reading again from the beginning of the current reading segment.

It is not much of an issue when the current reading segment is small, but when it spans an entire paragraph, as in the screenshot, it becomes quite noticeable.

nyetus · 2026-03-13T15:37:42+00:00

> Or do you mean restarting after closing the Read Aloud popup (or closing the document entirely)?

Ah, I realise that was ambiguous. Sorry! Yes, I meant that when you close and reopen the pop-up, playback starts again from the top of the page. It would personally feel more natural to me if it picked up from where it left off.

dstillman · 2026-03-13T15:41:38+00:00

What if you closed the tab and reopened the document, or restarted Zotero?

dstillman · 2026-03-13T15:49:49+00:00

In the first screenshot, a fairly large chunk of text, spanning multiple sentences, is selected as the current reading segment. (I must admit this can be inconvenient if you want to start from a particular sentence.)

Note that it highlights the entire current paragraph being read, even when it starts from a given sentence. In your example, if I right-click on "This" in "This is primarily", it starts reading from that sentence, even though it's detecting the end of the previous paragraph as part of the paragraph (likely due to a messy text layer in this document). That's why you're seeing sentence highlighting still work properly (except for that last sentence that's getting split up incorrectly).

In any case, we have a new text parser coming that should improve a bunch of things in PDFs like this.

nyetus · 2026-03-13T15:53:57+00:00

> What if you closed the tab and reopened the document, or restarted Zotero?

Even in that case, I would personally prefer it to continue from where I left off.

If I had to explain it, to me it is like returning to an audiobook after a short interruption. Closing the pop-up, or even the tab, perhaps because I want to skim the bibliography or look at another document for a while, does not mean that I want playback to restart elsewhere. I would prefer it simply to resume from where I left off, unless I decide to move to a different point myself.

> In any case, we have a new text parser coming that should improve a bunch of things in PDFs like this.

Got it. Thanks for the explanation, and I’m looking forward to the new parser. :-)