Arabic script and searchable PDFs in Beta 6.0.5

For those running 6.0.5-beta.1+bb8858569, the search function in PDFs with Arabic script does not perform correctly. It searches left-to-right only.

For example, if you want to search for the word زار you need to type راز.
  • Highlighting text is also backwards.

    Highlighted: ظفاح لزغ نزو رب
    What is should be: بر وزن غزل حافظ
  • I can reproduce the highlighting issue. But not the search issue. Are you talking about search inside Zotero PDF reader or search in the whole Zotero library?
  • edited April 5, 2022
    The search function works for the library as a whole.

    The PDF reader search function cannot handle right-to-left scripts--at least with Arabic. I have not tried Hebrew yet.
  • Could you open your PDF file in this online PDF viewer (you can drag and drop your file) https://mozilla.github.io/pdf.js/web/viewer.html and test if the search works correctly?
  • @martynas_b Any chance the highlighting issue for right-to-left scripts will be fixed?
  • @martynas_b The highlighting function works wonderfully now for books with clear typesetting!

    To followup with your April 5 question about search functions, the issue appears in certain PDFs from google books. Certain scans are not compatible with Zotero's PDF viewer.

    I tried the pdf viewer you recommended with this title:

    https://www.google.com/books/edition/Muntakhab_al_qaṣāyid/c5NN6Ro1trYC?hl=en&gbpv=0

    The search function fails and the scan does not display correctly. This may be particular to older prints and lithograph texts.
  • I can confirm for Hebrew that the highlighting in the online viewer https://mozilla.github.io/pdf.js/web/viewer.html is functioning more correctly. Some issues depend on the type and quality of OCR.

    Though highlighting works in proper direction, copying and pasting show odd behavior. Search in some docs works fine.

    Jstor pdf copy paste is a mess - though their ocr is generally poor. Search does not work, but ocr is obviously bad.

    Same doc with ocr by Acrobat and ABBYY works much better, However, of the spaces are stripped from the text, so it copies and pastes a long string of letters. The search function works.

    Native Word doc converted to pdf in Acrobat is generally good, though has strange parsing of words in a sentence.






  • @hu-itanu and @martynas, just following up on this as I discover other limitations for those of us working in non-Latin scripts.

    Not having OCR is one thing, but some PDFs simply do not display at all. Just started a new thread with some examples.
Sign In or Register to comment.