zotero could not read text from pdf

xiangyang_qu · May 12, 2019

zotero could not read text from pdf
Submit…

Clear

Submitted with Debug ID D1463470925

[JavaScript Error: "The connection was refused when attempting to contact wss://stream.zotero.org/."]

[JavaScript Error: "WebSocket connection closed: 1006 "]

[JavaScript Error: "C:\Program Files (x86)\Zotero\pdftotext.exe returned exit status 1" {file: "chrome://zotero/content/xpcom/utilities_internal.js" line: 516}]

[JavaScript Error: "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsILocalFile.remove]" {file: "chrome://zotero/content/xpcom/recognizePDF.js" line: 356}]

[JavaScript Error: "Could not read text from PDF"]

[JavaScript Error: "The connection was refused when attempting to contact wss://stream.zotero.org/."]

[JavaScript Error: "WebSocket connection closed: 1006 "]

[JavaScript Error: "The connection was refused when attempting to contact wss://stream.zotero.org/."]

[JavaScript Error: "WebSocket connection closed: 1006 "]

[JavaScript Error: "C:\Program Files (x86)\Zotero\pdftotext.exe returned exit status 1" {file: "chrome://zotero/content/xpcom/utilities_internal.js" line: 516}]

[JavaScript Error: "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsILocalFile.remove]" {file: "chrome://zotero/content/xpcom/recognizePDF.js" line: 356}]

[JavaScript Error: "Could not read text from PDF"]

[JavaScript Error: "C:\Program Files (x86)\Zotero\pdftotext.exe returned exit status 1" {file: "chrome://zotero/content/xpcom/utilities_internal.js" line: 516}]

[JavaScript Error: "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsILocalFile.remove]" {file: "chrome://zotero/content/xpcom/recognizePDF.js" line: 356}]

[JavaScript Error: "Could not read text from PDF"]

version => 5.0.66, platform => Win32, oscpu => Windows NT 10.0; WOW64, locale => en-US, appName => Zotero, appVersion => 5.0.66, extensions => Zotero LibreOffice Integration (5.0.14.SA.5.0.66, extension), Zotero Word for Windows Integration (5.0.12.SA.5.0.66, extension)

dstillman · May 13, 2019

I'm afraid this is a long-standing known issue for file paths containing extended characters on Windows. The problem is in the third-party PDF tool we use (Xpdf/pdftotext), and it happens when using that tool directly as well.

We're hoping to fix this (and other things) soon by switching to another tool for text extraction.

xiangyang_qu · May 13, 2019

For the moment，that means i cannot use the function of retrieve PDF metadata and it cannot be solved？

dstillman · May 13, 2019

You would need to move your Zotero data directory to a path without extended characters.

xiangyang_qu · May 14, 2019

At first, thanks for you answering
E:\zotero
This is my new path. Books can be retrieved but journal article cannot be retrieved.
How to deal with it.

dstillman · May 14, 2019

It depends on the PDF. Generally, if the PDF has an identifier (DOI, ISBN, etc.) on the first few pages, it will work. Otherwise, it helps if it's formatted like a standard academic PDF.

Can you provide some examples that don't work?

xiangyang_qu · May 14, 2019

it does work！ thanks a lot