Retrieve metadata fails with "PDF does not contain OCRd text" with

mronkko · March 1, 2012

I downloaded a document from here

http://www.oecd-ilibrary.org/industry-and-services/entrepreneurship-at-a-glance-2010_9789264097711-en

and attempted to retrieve metadata for the item. The document is a normal document that contains text and has a DOI on the third page. The error message that I get is "PDF does not contain OCRd text". Since the report clearly has text content, at least the error message is wrong.

Also the Zotero translator (DOI) fails on that page.

mronkko · March 1, 2012

Also, a related to this report. The item type that I would use for this is "report", and the report says that it should be cited as

OECD (2011), Entrepreneurship at a Glance 2011, OECD Publishing. http://dx.doi.org/10.1787/9789264097711-en

However, the report item type does not include DOI. Why is this?

dstillman · March 1, 2012

The document is a normal document that contains text and has a DOI on the third page.

The DOI is on the fourth page. Zotero's metadata retrieval currently only checks for (sufficient) OCRed text as far as the third.

dstillman · March 1, 2012

On the other hand, if you remove the blank second page Zotero identifies it as a completely different item, because it's using the boilerplate copy on the (then) third page instead of the DOI.

mronkko · March 1, 2012

The DOI is on the fourth page. Zotero's metadata retrieval currently only checks for (sufficient) OCRed text as far as the third.

It makes sense not to scan the entire document. However, the reason for failing is not that there is no OCRd text, but that the first three pages do not contain sufficient information to identify the document. The error message could be changed to reflect that.

Also for some reason Zotero cannot add this item by the DOI.