Looking for OCR Workflow (Having trouble with gImageReader) (slightly off-topic, but related)

PaulS42 · June 5, 2021

So far I've been using Zotero for source management, using the new highlighting function of Zotero to mark important passages and then export those excerpts to Obsidian. If the pdfs were not yet searchable, I OCR'ed them with gImageReader (tesseract) and exported them with an invisible text layer.

I was doing this on Windows 10, and recently installed a dual boot with Linux Mint and reinstalled Windows due to slow-down. Now I reinstalled gImageReader and I can no longer find the option to export to PDF with text layer. In fact I'm having trouble even finding much mention of said function online. I don't know if I originally installed some fork or beta or what, but the way the (probably) official version of the program looks is not what I'd been using so far.

I was always able to set certain post-production parameters, such as size of the invisible text etc.

Since that appears to no longer work, I'm looking for new solutions to make searchable pdfs. Often these are large files (several hundred pages) and scans of old books (like 19th century old).

I've also tried Zotero OCR, but I can't get it to work.

Any ideas and advice?
Thanks!