Arabic script and searchable PDFs in Beta 6.0.5

Tremolophage · April 3, 2022

For those running 6.0.5-beta.1+bb8858569, the search function in PDFs with Arabic script does not perform correctly. It searches left-to-right only.

For example, if you want to search for the word زار you need to type راز.

Tremolophage · April 4, 2022

Highlighting text is also backwards.

Highlighted: ظفاح لزغ نزو رب
What is should be: بر وزن غزل حافظ

martynas_b · April 5, 2022

I can reproduce the highlighting issue. But not the search issue. Are you talking about search inside Zotero PDF reader or search in the whole Zotero library?

Tremolophage · April 5, 2022

The search function works for the library as a whole.

The PDF reader search function cannot handle right-to-left scripts--at least with Arabic. I have not tried Hebrew yet.

martynas_b · April 5, 2022

Could you open your PDF file in this online PDF viewer (you can drag and drop your file) https://mozilla.github.io/pdf.js/web/viewer.html and test if the search works correctly?

Tremolophage · August 13, 2022

@martynas_b Any chance the highlighting issue for right-to-left scripts will be fixed?

Tremolophage · September 24, 2022

@martynas_b The highlighting function works wonderfully now for books with clear typesetting!

To followup with your April 5 question about search functions, the issue appears in certain PDFs from google books. Certain scans are not compatible with Zotero's PDF viewer.

I tried the pdf viewer you recommended with this title:

https://www.google.com/books/edition/Muntakhab_al_qaṣāyid/c5NN6Ro1trYC?hl=en&gbpv=0

The search function fails and the scan does not display correctly. This may be particular to older prints and lithograph texts.

hu-itanu · September 25, 2022

I can confirm for Hebrew that the highlighting in the online viewer https://mozilla.github.io/pdf.js/web/viewer.html is functioning more correctly. Some issues depend on the type and quality of OCR.

Though highlighting works in proper direction, copying and pasting show odd behavior. Search in some docs works fine.

Jstor pdf copy paste is a mess - though their ocr is generally poor. Search does not work, but ocr is obviously bad.

Same doc with ocr by Acrobat and ABBYY works much better, However, of the spaces are stripped from the text, so it copies and pastes a long string of letters. The search function works.

Native Word doc converted to pdf in Acrobat is generally good, though has strange parsing of words in a sentence.

Tremolophage · March 10, 2023

@hu-itanu and @martynas, just following up on this as I discover other limitations for those of us working in non-Latin scripts.

Not having OCR is one thing, but some PDFs simply do not display at all. Just started a new thread with some examples.