OCR Parameters on linux

mateMat · June 30, 2021

Hi, I wanted to check with you guys what are the parameters for the OCR plugin on linux.
I went to the github page and found the link to configuration for mac and linux does not work correctly.
So here are my config param:

First, I installed tesseract-ocr provided by my repositories.

For the OCR engine: /usr/bin/tesseract
For pdftoppm: /usr/bon/pdftoppm
For the language script: script/Latin

If you have a working param list, please let me know.

mateMat · June 30, 2021

Here is the URL of the plugin https://github.com/UB-Mannheim/zotero-ocr

adamsmith · June 30, 2021

I'd recommend posting this as an issue to that project's github -- I don't think the maintainers are super active here.

zuphilip · May 22, 2022

For refererence I want to give some information here, also the issue might be (hopefully) already long solved:

The easiest is to leave the configuration blank. Then the Zotero-OCR plugin will look for some default locations and possiblities for the poppler tools and tesseract. This includes calling the tools by their name e.g. tesseract, which should work as long as you have added it to the path variable.

Only if the default (empty) configuration for the path does not work, then you should specify the path on your local mashine. This should be the complete path including the name of the tool, e.g. tesseract resp. tesseract.exe. In the debug log it can been seen what calls are exactly tried out in the end.

The (default) language/script parameter is english, but can be changed. However, it is then crucual that you have installed the corresponding language/script model in tesseract. E.g. you can change it to script/Latin if you have installed that model in tesseract. The English language model (eng) is always installed.