PDF does not contain OCRed text

I get this message on a high proportion of occasions I attempt to 'retrieve metadata', and in most cases these are pdfs with real text, not images (ie. I can search for and find text when opening them up in a pdf reader). Also, these same pdfs *are* getting indexed, and I can find text within them using zotero.

Is this a problem or limitation with pdftotext? As an aside, I also note that adding pdfs is the single slowest operation in my use of zotero, taking minutes. Perhaps it's worth considering a different method of extracting pdf text.
  • As another aside on this: Mendeley does a nice but imperfect job of extracting references from PDFs. In an ideal world, Zotero would do likewise, and then create some kind of link between the new entry and any of the references that already exist as items within the zotero library. In an even more sumptuous academic Schlaraffenland, where the refs didn't exist within zotero, you could click on a link and have zotero look for them in the cloud and create new (and linked) entries.
  • I'm having this exact problem as well.

    I have many PDF books that were created directly from a software application- (these pdfs were not scanned and then OCRed). Zotero can index these pdfs, but when I try to "retrieve metadata", a message pops up to say "PDF does not contain OCRed text".

    Without the metadata, I cannot select the function "Rename File from Parent Metadata". Is there a solution to this?
  • please don't double post.
  • oops.. didn't mean to. I thought the other post disappeared when my browser crashed.
  • Is there a solution for this?
  • You'd have to provide a link to an example PDF for which you're seeing this for us to tell you more.
  • Just picking up on the previous conversations since I also would like to resolve this issue. Below is a link to an example pdf giving me the pop-up 'PDF does not contain OCRed text'

    http://sdh.ba.ttu.edu/commitment-trust-JM94.pdf
  • that PDF indeed does not contain OCRd text. The easiest test is to check whether you can select and copy text in your PDF reader with the text selection tool. If not, there is no way Zotero will be able to read it (and even if you run OCR on such a file yourself, it'd be unlikely for Zotero to find metadata for the PDF).
Sign In or Register to comment.