pdf metadata retrieval error

wieniawski · July 14, 2020

hi,

I dragged an existing pdf file into zotero, whose name contains Chinese charators. Then I right click and choose retrieval metadata for pdf.

At this time, an eror happened, which shows: No matching references found.

However, if the file name is totally Engish, the metadata retrieval function works OK.

So I guess it is due to Chinese charators.

And I googled on the web, and found out pdftotext and pdfinfo should be installed.

Then I download them and installed (unzip, and set an enrioment variable point to the folder). However the same error happens.

I also found that, in Edit->Preference->Search, there should be pdfinfo and pdftotext found. But in my zotero, they can not be found.

I also found that, the zotero installing folder, there are pdftotext.exe and pdfinfo.exe existed.

I have no idea how to resolve this problem.

Anybody can help me?

dstillman · July 14, 2020

And I googled on the web, and found out pdftotext and pdfinfo should be installed.

Then I download them and installed (unzip, and set an enrioment variable point to the folder).

No, don't do that — I'm not sure what you found online, but it's out of date and not something you should be following. The necessary PDF tools come bundled with Zotero, and have for years. Undo whatever you did.

What OS is this? Can you provide a Debug ID for a retrieve attempt that fails?

wieniawski · July 15, 2020

thanks for your reply.
My OS is win10.
And the debug ID is:
D217365727

hopefully I am doing correct.

dstillman · July 15, 2020

This actually isn't about the filename at all. Zotero is extracting text properly — it just can't find any metadata for that file. It's retrieving metadata for files with English filenames most likely just because they're publications in English that exist in the databases Zotero checks.

You don't say whether this is an academic paper. There are Chinese databases that Zotero isn't able to check, because they don't provide reliable ways to do so, but remember also that anything can be distributed as a PDF, and you shouldn't expect Zotero to be able to retrieve metadata for random documents. See Retrieve PDF Metadata for more info.

wieniawski · July 15, 2020

Thanks I am trying to understand the flow:
1. using pdfinfo and pdftotext to get the pdf information
2. send the information to https://services.zotero.org/recognizer/recognize to find the metadata
Am I right?

I tried to get metadata of another pdf file, but something fails. I logged the debug ID:
D1124319285. You can find the log.

I reviewed the log, and find out the error:
(3)(+0000008): Running C:\Program Files (x86)\Zotero\pdfinfo.exe 'E:\backup\zotero\storage\MA5FQN2G\The Quantization Eﬀects of the CORDIC Algorithm.pdf' 'E:\backup\zotero\storage\MA5FQN2G\.zotero-ft-info'

(1)(+0000221): Error running C:\Program Files (x86)\Zotero\pdfinfo.exe

(1)(+0000019): Error: C:\Program Files (x86)\Zotero\pdfinfo.exe returned exit status 1 Error: C:\Program Files (x86)\Zotero\pdfinfo.exe returned exit status 1 observe@chrome://zotero/content/xpcom/utilities_internal.js:551:27 From previous event: Zotero.FullText</this.indexItems@chrome://zotero/content/xpcom/fulltext.js:555:15

Which said, while using pdfinfo to get the information of pdf, it returns a error.

However, I am using a third party compiled pdfint to get the info, it is success.

The 3rd party pdfinfo version is 4.02, which I downloaded from here:
https://www.xpdfreader.com/download.html

Where you can see:
Download the Xpdf command line tools:

Linux 32/64-bit: download (GPG signature)
Windows 32/64-bit: download (GPG signature)
Mac 64-bit: download (GPG signature)

It seems that zotero orignal pdfinfo is not robust enough. And I also donot know what version of zotero's orignal pdfinfo.

Is it possible to update the zotero's pdfinfo in next version, so that this problem can be resolved?