PDF indexing issue: Korean text not searchable

Hello,

I've recently encountered an issue with PDF indexing. Until last week, I was able to search for Korean text in indexed PDFs. However, Korean words are no longer searchable after rebuilding the index - only English words appear in search results.

Here's what I've tried so far.

- Checked that my PDFs contain selectable text
- Reinstalled and updated to the latest Zotero version
- Cleared and rebuilt the index

Any help would be appreciated.

Thanks
  • Could you send an example PDF file to support@zotero.org with a link to this thread?
  • The PDF file contains decomposed Hangul Jamo characters, which Zotero seems unable to properly index. At some point, Zotero changed its full-text extraction engine for PDFs, and the new engine does not normalize text into Hangul Syllables, unlike the older extraction system.

    We'll work on fixing this, but you will likely encounter the same issue if you paste Hangul Jamo into Zotero notes, as they also cannot be indexed.

    How common are those PDF files?
  • It seems that the same issue is likely to occur in most PDFs written in Korean. I am experiencing this problem with all the Korean PDFs I have. Is there any way to work around this issue? For example, is it possible to use an older version of the full-text extraction engine?
  • The issue should be fixed in the next Zotero beta. You will need to reindex your PDFs. Zotero notes reindex when modified and resaved.
Sign In or Register to comment.