Erroneous "PDF does not contain OCRd text"

adamsmith · August 13, 2009

Trying to retrieve metadata for this pdf
http://www.cepr.net/documents/publications/small-business-2009-08.pdf
I get the above error message.
Not only does the PDF show up as indexed, if I do a search for some of its text in Zotero it comes up as a search result, so clearly there is OCRd text.
Retrieve metadata works fine for other documents.
Newest 2.0b6.5 Zotero on Ubuntu.

dstillman · August 14, 2009

I can reproduce this. It looks like the PDF recognizer currently only checks the first two pages for content, and this document's content doesn't start until the third page. We can probably up the limit to 3.

cabotage · March 18, 2010

I am also getting an erroneous error message that one of my PDFs does not contain OCRd text. I have triple checked that the first three pages have OCRd text and have run OCR recognition on this document multiple times. Page 5 does not contain OCRable information (I'm not entirely sure why). Would the presence of one non-OCR'd page further into the document cause this error message? Other than that, it's obvious that the rest of the document is OCRd and searchable by zotero.

I also have the newest version of Zotero, on Mac OS X Leopard.

dstillman · March 18, 2010

Provide either a link to the PDF or a Debug ID for the metadata retrieval.

cabotage · March 18, 2010

The PDF is from a book that I scanned in through Adobe Acrobat, so it is not online. Here is the debug ID: D670961453.