Indexing not working: D637792052

redcloud111 · October 17, 2017

Hello, all. I have Zotero working on several machines. I can't get indexing to work on any, neither the full index nor the option for unindexed only.

I am able to index individual PDFs by clicking Indexed: Partial function.

I have pdftotext and pdfinfo 3.02a installed.

Thoughts?

Thanks!

bwiernik · October 17, 2017

What settings do you have set for indexing in Zotero preferences?

redcloud111 · October 17, 2017

Hello, bwiernik. I am not sure exactly what settings you are referring to, other than those on the Search Tab (which I haven't changed):

https://1drv.ms/i/s!ApTaaiY2zZRmjgDoRS6sDueK2rhU

Let me know if the screenshot is not sufficient.

Thanks!

bwiernik · October 17, 2017

Okay, and what happens when you click Rebuild Index?

And what version of Zotero are you running?

dstillman · October 17, 2017

4.0.29.15

@redcloud111, should upgrade to Zotero 5.0.

redcloud111 · October 18, 2017

Hello, on my Mac version the pinwheel turns for a bit, the app non-responding, then it finishes but with no changes to the index count. The PC version just seems to activate a process for a few seconds, then nothing. The count remains at 1212 unindexed. I am able to index individual pdfs, but the overall count in the Index Statistics remains the same.

Thanks!

redcloud111 · October 18, 2017

dstillman. I didn't realize I wasn't updated. I will do that and check again. Thanks!

redcloud111 · October 18, 2017

I updated and tried to reindex and received a time out error: 1312258684

redcloud111 · October 18, 2017

I just updated my Macbook Pro, synced, and got an error: 1476699390

I did removed the old RTF/ODF scan addons and installed the new beta that works with 5.0 but still getting the error and not able to sync: 1108440472

I fear now I have two problems: 1) the original index issue and 2) this sync issues on my MBP

redcloud111 · October 19, 2017

I have solved 1108440472 by manually copying my good library from my iMac to my MBP.

But I still have the indexing problem at 1312258684 and 1476699390

dstillman · October 19, 2017

database disk image is malformed

Your database appears to be corrupted. You can try the DB Repair Tool.

redcloud111 · October 19, 2017

dstillman. Thanks, if this is in reference to 1108440472, that problem is solved (I copied a working library).

Any ideas on the sync issue, which I reported from my two mac machines: 1312258684 and 1476699390, or does this corruption issue affect indexing as well?

Thanks!

dstillman · October 19, 2017

Yes, that error is from 1476699390.

redcloud111 · October 19, 2017

dstillman. I may be confused, but I can't seem to get my unindexed pdfs indexed. I have tried on several machines. Here is my latest debugging report from Zotero standalone on a PC:

D523954063

and the report from my primary iMac:

D1917122520

redcloud111 · October 20, 2017

Hey, guys and gals. At this point I have no idea what to try next. I assumed partial means that I have an OCRed PDF that hasn't been fully indexed. Is this correct? What bothers me is that I am seeing 1240 partial, and I would like them fully indexed. If I do them manually, my count drops. So, there appears to be a problem.

I have deleted and reinstalled the pdftotext and pdfinfo on three machines, and still my library has these errant partially installed PDFs.

Can someone advise steps?

dstillman · October 20, 2017

(Please do try to be a bit more patient. It hadn't even been 24 hours since you posted the Debug IDs.)

First, to clarify, "Partial" just refers to files that are bigger than your indexing settings allow, and the buttons in the preferences follow those same settings. If you reindex an item individually, that triggers a full reindex. Otherwise you'd have to increase the max pages/characters settings and rebuild the index, but we don't particularly recommend that, since it will slow down searching.

Other things:

For the PC, if you're still having trouble there, you can try the 5.0 Beta, which should fix some problems with background full-text indexing that were showing up in your debug output. If that doesn't help, provide another Debug ID for an index attempt from that.

For the iMac, you're getting this:

/Users/[…]/zotero/pdftotext-MacIntel returned exit status 3

This is what pdftotext returns when there's a permissions problem with the PDF. PDFs can disallow text extraction, though the degree to which different tools (and versions of tools) obey that varies. But that would explain unindexed files.

redcloud111 · October 20, 2017

dstillman. I am sorry for seeming impatient. I tend to get a bit manic when I don't understand what is happening in the background or have any real sense how long something should take. As of now, my PC seems to be maintaining the message "syncing full-text content" for an hour now, which is longer than any time previously. I hope that means all is well and the process will continue.

I have increased the size of the pages and characters to see if this works. So far, so good.

I will update if I have any questions. Thanks so much for your help!

redcloud111 · October 21, 2017

dstillman, this time I thought I was seeing progress. I was able to see the "syncing full-text content" message for several hours. All day it ran, and into the night. When I woke up, though. I saw the red error icon:

764362703

dstillman · October 21, 2017

That's with 5.0.23. As I said, you should use the latest 5.0 Beta if you're still having trouble.

redcloud111 · October 21, 2017

Ah. Okay. I will try and the beta and post results.

redcloud111 · October 21, 2017

dstillman. I installed the beta on my PC and got a cancellation error:

1745223485

dstillman · October 21, 2017

That's a Report ID — we need a Debug ID for this.

redcloud111 · October 21, 2017

Ah, sorry. I didn't realize they were different. Here you go.

D556429149

redcloud111 · October 24, 2017

Any progress with my debug report?

Thanks!

dstillman · October 24, 2017

What exactly is the issue at this point? Where are you seeing the cancellation error?

That debug output shows ongoing full-text indexing. There are still some earlier errors logged, but no problems are showing up in the debug output itself (other than a couple PDFs that don't allow text extraction).

Also, have you increased the max pages/characters settings, and if so to what?

redcloud111 · October 24, 2017

dstillman

I am sorry if I am misunderstanding how this works, but I assumed there is a problem because I still see 740 partially indexed items. If I do them manually, they index. So, I was hoping the tool would automate this for me. All would be well in my mind when I saw the partially indexed number at zero. I assumed the non-indexed items would be non-OCRed pdfs and the such. Is this not the case?

I increased the numbers to 750000 characters and 300 pages. Should I go higher?

Also, I saw the last cancellation error when I debugged and ran a partial index.

redcloud111 · October 31, 2017

You can close this. All is working now, and I think I have a better grasp on how indexing works. Thanks!