zotero ocr with tesseract.js and WebAssembly possibility

I am using the Zotero OCR plugin (https://github.com/UB-Mannheim/zotero-ocr) but found that we need to install tesseract and poppler first and then provide the path of the executables in the settings, which is a bit complicated for users who are not very comfortable in developing/programming. In fact, based on my search, there were many discussions on zotero forum asking about the process of installing the plugin and I believe this is due to the complexity of installation.

I am not a JS developer, but I came across a pure JS tesseract repo (https://github.com/naptha/tesseract.js) and the idea of WebAssembly. I just want to ask here if it is possible or feasible to compile the OCR engine (tesseract) into WebAssembly and put it into the plugin directly (without pointing to local installation)?
  • (zotero-ocr is maintained at the UB Mannheim and its maintainers only rarely come by here these days; you'd be better off posting this as a github issue there)
  • i did this way on Mac: i installed both tesseract and poppler using brew
    the in order to find the folders, i know that all the brew installs are subfolders of
    /opt/homebrew/Cellar

    hence the complete path
    /opt/homebrew/Cellar4.1.1/bin/tesseract

    same for poppler package, of which pdftoppm is a component
    hence the complete path
    /opt/homebrew/Cellar/poppler/21.11.0/bin/pdftoppm

    best
    Maurizio
Sign In or Register to comment.