Need for OCR plug-in

It appears that nothing is happening with Zotero OCR in preparation for version 7. The ability to OCR from within Zotero is a great feature, especially now that the Zotero Reader is making it possible for people to let go of external PDF tools. I hope another developer will take up the challenge to create a way to OCR PDFs in Zotero 7 if the Zotero OCR is going defunct.
  • Thanks, @erazlogo. The instructions appear to require Zotero OCR, or is there a way without it?
  • Ah, that's right. So this will not work with Zotero 7. That is a pity.
  • OCR really should be sherlocked into Zotero
  • I would love it to be standard.
  • We don't need an OCR option in Zotero. @adamsmith @dstillman

    Zotero needs to handle non-ocred pdf files with ease.

    Because even if i use most advanced tools to create an OCRed pdf and believe me i tried on windows and linux. Handwritten or similar documents (old, damaged real word papers like newspapers, pdfs with created by old fonts etc..) wouldn't give wanted results.

    Literally i created a trained tesseract language file for this purpose,

    Zotero shouldn't let me do that. Zotero should't allow me to waste my time with this.

    I shouldn't spend much time to created a trained tesseract language file to scan a handwritten pdf document instead of reading it in Zotero. So i gived up trying that.

    At the end if zotero or any plugin developer would created a plugin for ocr purpose. It would be limited and wouldn't give best results without lots options like skewing, unpapering, cleaning, rotating etc.. it would use tesseract mostly and tesseract doesn't give me good enough results, even i used custom language file.

    So my humble opinion and suggestion for developers of zotero, draw straight lines with attirbute of notes and tags. if i underline a word with draw tool, i can write what it is in note section and i can find it in left panel easily. By the way that line need to be straight so we need to have draw straight line option and it shouldn't create an image file.
  • +1
    @documut I agree that there are many cases where OCR doesn't work well enough to be bothered using it. My wild guess is that you are from the History Department as well. But even there we do have documents in typewriting that would immensely profit from OCR and numerous Papers or scanned Documents dont have OCR yet either.
    And Historians aren't the only ones using Zotero. OCR is a useful feature for a lot of people out there.
    No one wants to have forced OCR, but only because it was a waste of time for you doesn't mean that the feature is useless.
  • For those who care: the Zotero-OCR plugin is now compatible with Zotero 7
  • Thank you! Just installed it. I find OCR really useful for older books and all of those go directly into my Zotero library. And for newer publications, unprocessed scans people post on academia, etc.

    There's no need to re-invent the wheel with this plug in (or type anything by hand!).

    The best open-source OCR program I've used is OCRmyPDF: It also the same engine (tesseract) but has a lot of other features (including compression) and excellent documentation.

    But it only runs on the command line. A really amazing Zotero plugin could just be a front end GUI for OCRmyPDF, and would be a lot less work than writing a new OCR program.
Sign In or Register to comment.