PDF does not contain OCRed text

CB · June 24, 2009

I get this message on a high proportion of occasions I attempt to 'retrieve metadata', and in most cases these are pdfs with real text, not images (ie. I can search for and find text when opening them up in a pdf reader). Also, these same pdfs *are* getting indexed, and I can find text within them using zotero.

Is this a problem or limitation with pdftotext? As an aside, I also note that adding pdfs is the single slowest operation in my use of zotero, taking minutes. Perhaps it's worth considering a different method of extracting pdf text.

CB · June 24, 2009

As another aside on this: Mendeley does a nice but imperfect job of extracting references from PDFs. In an ideal world, Zotero would do likewise, and then create some kind of link between the new entry and any of the references that already exist as items within the zotero library. In an even more sumptuous academic Schlaraffenland, where the refs didn't exist within zotero, you could click on a link and have zotero look for them in the cloud and create new (and linked) entries.

coolrat33 · March 25, 2011

I'm having this exact problem as well.

I have many PDF books that were created directly from a software application- (these pdfs were not scanned and then OCRed). Zotero can index these pdfs, but when I try to "retrieve metadata", a message pops up to say "PDF does not contain OCRed text".

Without the metadata, I cannot select the function "Rename File from Parent Metadata". Is there a solution to this?

adamsmith · March 25, 2011

please don't double post.

coolrat33 · March 25, 2011

oops.. didn't mean to. I thought the other post disappeared when my browser crashed.

hladik · March 12, 2015

Is there a solution for this?

dstillman · March 12, 2015

You'd have to provide a link to an example PDF for which you're seeing this for us to tell you more.

Bryan Ndwigah · February 16, 2017

Just picking up on the previous conversations since I also would like to resolve this issue. Below is a link to an example pdf giving me the pop-up 'PDF does not contain OCRed text'

http://sdh.ba.ttu.edu/commitment-trust-JM94.pdf

adamsmith · February 16, 2017

that PDF indeed does not contain OCRd text. The easiest test is to check whether you can select and copy text in your PDF reader with the text selection tool. If not, there is no way Zotero will be able to read it (and even if you run OCR on such a file yourself, it'd be unlikely for Zotero to find metadata for the PDF).