PDF Reader - highlighted text contains no spaces with some PDFs

I have just noticed that in the beta PDF Reader *for some PDFs* highlighted text is shown in the annotations panel without spaces - i.e. all text is run together.

When I open those same PDFs in other PDF readers, highlighted text is shown correctly, with spaces.

I cannot see any differences between PDFs that the PDF Reader handles correctly and those that it doesn't - has anyone else noticed this, or can help with diagnosis?

Zotero 5.0.97-beta.33+fdcd4e51c on Windows 10
  • The highlighted text that is visible in annotations sidebar is extracted text (the same that you get when copying selected text). Some PDFs, especially the scanned and OCRed ones, have a very poor quality text layer with various errors. If you want to know if other PDF readers are extracting it correctly, you have to copy and paste that text somewhere.

    If those PDFs are scanned, a newer and more advanced OCRing software could help to replace text layers.
  • Thanks for responding @martynas_b. What I don't understand is with the exact same PDF file I have Zotero PDF Reader giving me:
    "plagueofbedbugsandfamilyillness.Attheotherendofthecountry,acountessispaintingbotanical"
    and PDF-XChange Editor giving me:
    "plague of bed bugs and family illness. At the other end of the country, a count
    ess is painting botanical"

    As far as I can see, they two programs must be interpreting what they pull out of the text layer differently.
  • @richard.masters If you want, you can send us the PDF to support@zotero.org with a link to this thread, and we will try to improve our PDF reader in future versions.
Sign In or Register to comment.