unindexed PDF attachment cannot be indexed by click the indexing button(the green one).

Hi! According to the instructions of @adamsmith , I have saved the unindexed PDFs as a collection in my library to let me search my references without omission.

For PDFs, doing and advanced search with
Attachment File Type --> is --> PDF
and
Attachment Content -- does not contain -- .
(https://forums.zotero.org/discussion/comment/291997#Comment_291997)

I have upload the settings as a captured image:
http://imglf3.nosdn.127.net/img/M3B0VGdGQVdIRGpNNnhFRXQvL25GZ3hGWElwamwwZUN0N01zVytrUjJFNnRFWFBsdzRkUDlBPT0.jpg?imageView&thumbnail=1920y1027&type=jpg&quality=96&stripmeta=0&type=jpg

However, when I clicked the indexing button in the right window of these unindexed PDFs one by one, none of them could be indexed. The green button just rotated for a circle, and then nothing happened. The status of the PDF was still the "Indexed: No".

I'm sure these PDFs are consist of words but not images. The PDFs with such problems were uploaded as follows,
https://pan.baidu.com/s/1o8Wu3G2

How could these PDFs be indexed?

I think that the collection of "unindexed PDFs" should be assambled in the zotero 5.0 as a default collection. I cannot miss these valuable references when I'm using the advanced search. These references are important as the cited rate of them are high. Please help.
  • Hello, if anyone willing to help me? The unindexed PDFs?
  • I only tried the first PDF ("Studies in Air−Water Interfacial Area for Wet Unsaturated Particulate Porous Media Systems.pdf"), but it works fine for me.

    If it's not being indexed for you, we'd need to see a Debug ID for an index attempt.

    Note that some files have a flag that's meant to prevent text extraction, and the PDF extraction tool that Zotero currently uses might obey that. We'll soon be pushing out a new version of the tool in an upcoming version that might be a bit more lenient. But at least with the file I tested, that's not the issue.
  • Yes, I tried the the first PDF again by click the refresh green button("Studies in Air−Water Interfacial Area for Wet Unsaturated Particulate Porous Media Systems.pdf"), and still it shows "Indexed:No". My zotero version is 5.0.34.

    The bug ID is D1119059648.

    I also tried another PDF which is very important for me for its Number of references ("X-ray Microtomography Determination of Air−Water Interfacial Area−Water Saturation Relationships in Sandy Porous Media").

    The bug ID is D43363690.

    @dstillman Thank you for your selfless help! I'm really confused by the problem.
  • C:\Users\[…]\OneDrive\Zotero\pdftotext-Win32.exe returned exit status 1
    First, it appears you may have your entire Zotero data directory in a cloud service. That's strongly discouraged, and it's very likely that you'll eventually corrupt your database that way.

    As for the error, if you look at the debug output you can see the full command line that it's running. If you run that yourself from cmd.exe, you may be able to see the underlying error. It's not clear to me if you've indexed any files from this computer. If not, I'd guess that there's simply a restriction on running programs from this volume.
  • @dstillman Thank you for your help and suggestions. I'll move the program folder away from the cloud service.

    I have indexed more than 190 PDFs on this computer. But for the paper in the folder which I uploaded, they failed to be indexed. To show a comparison, I test two cases.

    The first one is a PDF which is successfully indexed. The bug ID is D1201188392

    The second one is a PDF with indexed failure. The bug ID is D1870897017

    I have viewed the output of the bug report, but I'm afraid that I don't know how to test them in the cmd.exe. Sorry about that.
  • We'll be rolling out updated PDF tools in the Zotero beta in the next couple days, so let's wait to debug this further until those are out.
  • Thanks!

    P.S. As a control group, I tried to attach the PDF directly on the bibliography instead of the link of the Zotfile, but the PDF still cannot indexed. The bug ID is D1224135229.
Sign In or Register to comment.