Continuity of the annotation between 2 pages

LuisBAZ · August 18, 2023

Hello!
It would be really useful if when we annotate a passage that is split between two pages, Zotero could recognize the font formatting to avoid highlighting footnotes, page numbers, headers, and footers.
And maybe add to the citation the page's interval numbers.
:)
Thanks

martynas_b · August 21, 2023

Excluding header/footer from selection is planned. But taking into account font also sounds interesting.

LuisBAZ · August 21, 2023

Thanks, it is nice to know that's planned. Thank you all Zotero devs !!!

cszarvaskidd · August 5, 2024

As of 8/2024 the "planning" never got accomplished. Every piece of scholarship I'm reading and highlighting over two pages is including footer, page numbers, lead author's contact info, etc. if I'm overlapping the end of one page and the start of another. Are there any updates available as to when plan will be implemented? Much appreciated.

martynas_b · August 5, 2024

@cszarvaskidd That is already partially implemented in the Zotero 7 beta.

miguelclaramunt · November 12, 2024

@martynas_b do you know if this feature is available in Z7 ? I cannot find it in settings nor in release notes.

martynas_b · November 13, 2024

@miguelclaramunt For some PDF files, Zotero 7 already excludes header and footer from text selection when selecting body text. This will gradually improve in future updates.

miguelclaramunt · November 14, 2024

Nice, thank you so much!

jb_r · March 21, 2025

@universidan suggested improving Zoteros behaviour when annotating passages spanning consecutive pages by setting page margins, which can be automatically detected as well as manually set in case that the detection is not perfectly on spot: https://forums.zotero.org/discussion/120099/how-to-exclude-footnote-header-when-highlighting-over-two-pages/p1

I would conceptualize the intended behaviour as part of text the generel interaction of text selection. The target would be a way of splitting the document into different regions, and treating them as seperate when performing text selections, i.e. if I start selecting text in a "body text" of page 1 and move my curser to the next page, the selection will extend only to parts of the body text of page 2.

This concept may be advanced with dynamically detected subregions, the prime example being footnotes: Footnotes appear where body text usually appears, but they can be identified dynamically by looking for a subregion at the bottom of the body region, which may use a smaller Font size, or be separated by a horizontal line. Similarly, if I start selecting text by clicking and dragging inside a "footnote section" and moving to another page, the selection will extend only to include text belonging to other footnote sections.

Again, while it seems impossible to implement such automatic region detection that works perfectly in every case, the option to have the user quickly draw a box on the page to correct any mistakes seems like a pragmatic improvement.

I also want to add that this promises not only to improve not only Zotero's annotation experience, but also make general copy-pasting more intuitive and less frustrating, and it may even substantially improve the behavior of widely popular plugins that depend on text selection, like text-to-speech or translation.

iagogv · March 21, 2025

I don't know if it already does, but it should exclude not only headers/footers, but tables, figures, margin comments... too

jb_r · March 21, 2025

@iagogv I think whether it is excluded or not just depends on whether text that is part of tables and figures is actually text or only pixels of an image. If it is text, it is included in selections.

While I understand that it is annoying if a selection cannot "tunnel through" tables or figures, I do not see an easy way to implement excluding it from selection while dragging the mouse. At least it cannot be done with the concept of "page regions" which I suggested above, because it would require Zotero to differentiate text elements that are not identified only by their absolute positioning in the pdf, but on the type of information they encode (because tables and figures appear where body text normally appears). However, since pdfs may be constructed in various ways, I would not expect to be able to infer this from anything formally encoded in the pdf. One would probably have to actually visually analize the page as an image to detect borders.

I suggested dynamic regions for footnotes, but here I am only optimistic that this is possible because this essentially splits a page region in half (while each still remains anchored to an absolute position of the page which is the same for all pages of the document). One only needs to detect the vertical position where body ends and footnotes begin, which seems possible in principle because footnotes are usually very clearly set apart from body text by font size. A rudimentary visual border dectection may be possible by looking for the presence of structural separators like a horizontal line or an empty region between the last line of body text and the first line of footnote text that has a different height than the empty region between lines of paragraphs of body text. Even though there is variation in footnote formatting, there is only a relatively small number of ways to add footnotes to a document, which are quite universal. This is not the case for embedding tables and images in or between paragraphs of body text.

Excluding margin comments, however, is easily compatible with the idea of page regions, as they are defined as text elements to the left or right of the body text region (while the header region would be those text elements above it, and the footer region those below)

martynas_b · March 24, 2025

@jb_r

I would conceptualize the intended behaviour as part of text the generel interaction of text selection. The target would be a way of splitting the document into different regions, and treating them as seperate when performing text selections, i.e. if I start selecting text in a "body text" of page 1 and move my curser to the next page, the selection will extend only to parts of the body text of page 2.

The Zotero desktop client has been doing that since the 7.0 release. Haven't you noticed? Of course, it doesn’t work with all PDF files yet, but it will improve over time.

anddna · May 23, 2025

@martynas_b

I haven't noticed at all, Zotero 7 never ignores page margins and selects everything. Is there a way to adjust settings for this behaviour?

A simple horizontal line (per document and per page with a modifier button such as SHIFT) that allows to manually reconfigure the ignore position would be great.

By 'improve over time' are you suggesting settings such as these or? How can I suggest these improvements officially, over GitHub?