Searchable PDFs Not Searchable In Main Interface
We are evaluating Zotero and love the product. There are a number of PDFs that we uploaded to Zotero, select and then do a metadata retrieval process.
The error back is PDF Name > Could not read text from PDF or no matching reference found or PDF does not contain OCRed text.
When we select the PDF and open it, we confirm it is searchable and can go to the bottom of the window and enter terms and they come up normally.
We tried to apply the fix described in report ID 66. Replacing the PDFInfo and PDFtoText files and reinstalling did not work.
Is there anything that someone can advise to help solve this?
The error back is PDF Name > Could not read text from PDF or no matching reference found or PDF does not contain OCRed text.
When we select the PDF and open it, we confirm it is searchable and can go to the bottom of the window and enter terms and they come up normally.
We tried to apply the fix described in report ID 66. Replacing the PDFInfo and PDFtoText files and reinstalling did not work.
Is there anything that someone can advise to help solve this?
These are three different error messages that have different causes. You mean you're getting them all at once for a single document?
In the tab on the right, does it show up as indexed? sorry, where did you find that suggested fix/what do you mean by report ID?
https://forums.zotero.org/discussion/28820/retrieve-metadata-for-pdf-broken-report-id-660357467/
As for the errors, selecting five PDFs and running the metadata process yields the various errors. Meaning, each presents with its own error but within the same results window.
- no OCRd text means what it says: Zotero can't find any OCRd text in the PDF
- can't read text means the pdftools fail, usually for unknown reasons. I believe encrypted PDFs or the like can cause this, but I'm not 100% sure
- no matching references found means that Zotero looked for a DOI, an ISBN, a finally a text string via google scholar and didn't come up with anything.
So the reasons these may happen are quite distinct. The searchable PDF - which error message did you get for that?
Metadata retrieval process (right click on PDF)
Indexed - no matching reference found
Not indexed - Could not read text from PDF
Cannot index non-indexed PDF. Checked security on PDF and there is none. If I open the PDF we can search it without any issues.
@Dan - you can e.g. test this on:
http://catdir.loc.gov/catdir/samples/cam033/2002073770.pdf
which has no security and works fine with pdftotext 3.0.3, but Zotero's pdftotext version run from the commandline returns: "Error: Copying of text from this document is not allowed."
@tdhuman - in my experience this is pretty rare - maybe 1 in 50 or so. It is possible to replace Zotero's pdftools with the updated version manually if that's of interest to you.
You can't replace pdftotext with 3.03 on Windows without altering the binary — otherwise you'll get a black console window every time it runs. (And pdfinfo needs a custom build to output to a text file, though the version probably doesn't matter for that anyway.) I don't even remember off-hand how we did that — it's some Windows flag on the executable — but that's why we distribute custom versions.
The remaining issue is related to indexed PDFs. Running the metadata retrieval process shows error: no matching references found.
If we open the PDF it is searchable. There is no encryption or other restrictions on the PDF.
Not sure if the 3.0.3 exe's would help fix this. If so, would you mind letting me know how to download them?