Problem extracting highlights with hyphenated words at end of sentence

The "add notes from annotations" function is not extracting all of my highlights. Specifically, it omits what remains of a highlighted paragraph whenever there is a word broken over two lines by a hyphen. In other words, when a paragraph includes a hyphen at the end of the line, only the parts preceding the hyphen at the end of the line are being extracted. Is there any fix for this problem? Is the - at the end of a line being interpreted as the program as a place to stop extracting the highlight?

If there is no way to fix this, is there a way to revert to the old version of zotero with the zotfile extract annotations function? I never encountered this problem with the zotfile version of extracting notes. Thanks in advance for any help!
  • For example, consider the following sentence:

    "There is nothing in the text that suggests continued and intense effort is requir-
    ed for this to work properly."

    When I try to extract this highlighted sentence, only the part preceding the hyphen at the end of a line is being extracted. That is, the extraction is:

    There is nothing in the text that suggests continued and intense effort is requir
  • @jcpeckham We can't reproduce this. Could you send an example PDF where that issue appears? You can send it to support@zotero.org with a link to this thread.
  • Thank you. I have sent an email as you describe.
  • @jcpeckham The PDF file seems to be in a really poor quality, but I still can't reproduce your issue. Can you tell me which specific part of the text, in the PDF that you sent to us, performs worse than with ZotFile?
  • I encounter the same issue with other files. This is just one example. In this and other pdf files, when a paragraph includes a hyphen at the end of the line (e.g., when part of a word is continued on the next line), only the parts of the paragraph that precede the hyphen at the end of the line are being extracted while the rest of the highlighted paragraph is omitted.

    For example, in the first highlighted paragraph of the pdf I sent, the entire first paragraph is highlighted, but only the first three lines of the first paragraph are extracted (with extraction stopping with the "im-" in "implies." The same issue occurs in all other places where a line ends with a hyphen breaking up a word. I get only the beginning part of the paragraph I highlighted, while the rest of the paragraph is not extracted.

    It seems as if the hyphen at the end of lines breaking up words is being interpreted as the end of what is highlighted. I never encountered this issue when I extracted notes with Zotfile.
  • I get "a t m id nigh t that im plies th a t prayer", which is of course poorly extracted, but still it doesn't stop at "implies".

    Which OS and Zotero version do you have?
  • Must be a problem with my setup then.

    I am using:

    Mac OS Big Sur 11.6.5

    Zotero 6.0.9
  • I have uninstalled and re-installed Zotero, but the problem remains. The issue arose when Zotero auto-updated to the version without the ability to extract with Zotfile.

    It seems there is no simple solution to find why this is happening in my setup. As a workaround, is there a way to install the previous version of Zotero with zotfile so I can use the extract function that way?

    Again, thanks for your help.
  • I can reproduce. We're looking into it.
  • Thank you.
  • This specific text truncation issue will be fixed in the next update.

    (It is caused by poor OCR software mapping some glyphs to NULL and therefore breaking strings).
  • That is good news. Thank you!
Sign In or Register to comment.