OCR Capability

I know Zotero can't OCR images, but some kind of integration with an OCR would be great. I know google desktop has a omnipage plug-in that scans pdfs, is there a way that zotero can automatically collect the cached text and assign it to the pdf?
  • If an OCR tool embedded the detected text into the PDF as a hidden text layer (as is the case for PDFs from JSTOR and some other sources), Zotero would automatically index the content. There's not currently another way to get OCRed text into Zotero.

    OCR of non-copyrighted materials will be available as part of Zotero Commons:
    As an added incentive to donate to the Commons, the Internet Archive will provide free OCR for your contributions and send you the transcribed text to help you search your personal library.
  • Open Source OCR software from Google and HP:

    Luc Vincent 2006. Announcing Tesseract OCR. Google Code Blog. Available at: http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html [Accessed April 5, 2008].

    This might be useful to someone rather than put too many eggs in one basket. Would be nice to highlight perhaps an area of a jpg with one click and get a rough approximation of the text to appear in a note though.
Sign In or Register to comment.