Zotero OCR/Tesseract tutorial

Hey, I wrote up a Zotero OCR/Tesseract tutorial for my students--maybe people here will find it useful.
https://publish.obsidian.md/history-notes/04+OCR+in+Zotero
  • Thank you so much this is wonderful!
  • Thank you so much! I've been able to get it almost right. Apparently the plug in works, but it doesn't produce a new .pdf file that I can read, but a separate .txt file that is saved on the computer. Has anyone already encountered the same problem?
  • Never mind. I've gotten the link to the pdf but it doesn't save in the same collection. Weird
  • Hey there!

    I followed the steps of that tutorial but when i try to OCR the PDF nothing happens. I can sometimes see in the task manager, that pdftoppm is actually using CPU and doing something but other than that nothing happens. Any idea why that is and how to fix it?

    Thx
  • Thank you, I followed it and it is brillant, but please add a note that until this is repaired, every pdf will expand to ca 10 times its original size. For me, this makes the whole process useless. it may not for others, but they should know before they install it.
  • @eseila It works for me so far. The process can take a long time. You can click on "Show file" to see what's happening.
  • I've also followed the tutorial but nothing happens when I OCR the selected PDF.
  • I can see in the containing folder that all the intermediate PNG files have been created.
  • @AndrewRRM this happened to me once, but I was never able to reproduce it again. How big is your document?
  • Ah well, 200 pages. Too big? I'll try it on something smaller.
  • No, that shouldn't be too big--I ocr'ed 300 page documents without a problem. Something else must be stalling it. Can you try another document?
Sign In or Register to comment.