Indexing not working: D637792052

Hello, all. I have Zotero working on several machines. I can't get indexing to work on any, neither the full index nor the option for unindexed only.

I am able to index individual PDFs by clicking Indexed: Partial function.

I have pdftotext and pdfinfo 3.02a installed.

Thoughts?

Thanks!
  • What settings do you have set for indexing in Zotero preferences?
  • edited October 17, 2017
    Hello, bwiernik. I am not sure exactly what settings you are referring to, other than those on the Search Tab (which I haven't changed):

    https://1drv.ms/i/s!ApTaaiY2zZRmjgDoRS6sDueK2rhU

    Let me know if the screenshot is not sufficient.

    Thanks!
  • Okay, and what happens when you click Rebuild Index?

    And what version of Zotero are you running?
  • 4.0.29.15

    @redcloud111, should upgrade to Zotero 5.0.
  • Hello, on my Mac version the pinwheel turns for a bit, the app non-responding, then it finishes but with no changes to the index count. The PC version just seems to activate a process for a few seconds, then nothing. The count remains at 1212 unindexed. I am able to index individual pdfs, but the overall count in the Index Statistics remains the same.

    Thanks!
  • dstillman. I didn't realize I wasn't updated. I will do that and check again. Thanks!
  • I updated and tried to reindex and received a time out error: 1312258684
  • edited October 18, 2017
    I just updated my Macbook Pro, synced, and got an error: 1476699390

    I did removed the old RTF/ODF scan addons and installed the new beta that works with 5.0 but still getting the error and not able to sync: 1108440472

    I fear now I have two problems: 1) the original index issue and 2) this sync issues on my MBP
  • I have solved 1108440472 by manually copying my good library from my iMac to my MBP.

    But I still have the indexing problem at 1312258684 and 1476699390
  • database disk image is malformed
    Your database appears to be corrupted. You can try the DB Repair Tool.
  • dstillman. Thanks, if this is in reference to 1108440472, that problem is solved (I copied a working library).

    Any ideas on the sync issue, which I reported from my two mac machines: 1312258684 and 1476699390, or does this corruption issue affect indexing as well?

    Thanks!
  • Yes, that error is from 1476699390.
  • edited October 19, 2017
    dstillman. I may be confused, but I can't seem to get my unindexed pdfs indexed. I have tried on several machines. Here is my latest debugging report from Zotero standalone on a PC:

    D523954063

    and the report from my primary iMac:

    D1917122520
  • Hey, guys and gals. At this point I have no idea what to try next. I assumed partial means that I have an OCRed PDF that hasn't been fully indexed. Is this correct? What bothers me is that I am seeing 1240 partial, and I would like them fully indexed. If I do them manually, my count drops. So, there appears to be a problem.

    I have deleted and reinstalled the pdftotext and pdfinfo on three machines, and still my library has these errant partially installed PDFs.

    Can someone advise steps?
  • (Please do try to be a bit more patient. It hadn't even been 24 hours since you posted the Debug IDs.)

    First, to clarify, "Partial" just refers to files that are bigger than your indexing settings allow, and the buttons in the preferences follow those same settings. If you reindex an item individually, that triggers a full reindex. Otherwise you'd have to increase the max pages/characters settings and rebuild the index, but we don't particularly recommend that, since it will slow down searching.

    Other things:

    For the PC, if you're still having trouble there, you can try the 5.0 Beta, which should fix some problems with background full-text indexing that were showing up in your debug output. If that doesn't help, provide another Debug ID for an index attempt from that.

    For the iMac, you're getting this:
    /Users/[…]/zotero/pdftotext-MacIntel returned exit status 3
    This is what pdftotext returns when there's a permissions problem with the PDF. PDFs can disallow text extraction, though the degree to which different tools (and versions of tools) obey that varies. But that would explain unindexed files.
  • dstillman. I am sorry for seeming impatient. I tend to get a bit manic when I don't understand what is happening in the background or have any real sense how long something should take. As of now, my PC seems to be maintaining the message "syncing full-text content" for an hour now, which is longer than any time previously. I hope that means all is well and the process will continue.

    I have increased the size of the pages and characters to see if this works. So far, so good.

    I will update if I have any questions. Thanks so much for your help!
  • dstillman, this time I thought I was seeing progress. I was able to see the "syncing full-text content" message for several hours. All day it ran, and into the night. When I woke up, though. I saw the red error icon:

    764362703
  • That's with 5.0.23. As I said, you should use the latest 5.0 Beta if you're still having trouble.
  • Ah. Okay. I will try and the beta and post results.
  • dstillman. I installed the beta on my PC and got a cancellation error:

    1745223485
  • That's a Report ID — we need a Debug ID for this.
  • Ah, sorry. I didn't realize they were different. Here you go.

    D556429149
  • Any progress with my debug report?

    Thanks!
  • What exactly is the issue at this point? Where are you seeing the cancellation error?

    That debug output shows ongoing full-text indexing. There are still some earlier errors logged, but no problems are showing up in the debug output itself (other than a couple PDFs that don't allow text extraction).

    Also, have you increased the max pages/characters settings, and if so to what?
  • edited October 24, 2017
    dstillman

    I am sorry if I am misunderstanding how this works, but I assumed there is a problem because I still see 740 partially indexed items. If I do them manually, they index. So, I was hoping the tool would automate this for me. All would be well in my mind when I saw the partially indexed number at zero. I assumed the non-indexed items would be non-OCRed pdfs and the such. Is this not the case?

    I increased the numbers to 750000 characters and 300 pages. Should I go higher?

    Also, I saw the last cancellation error when I debugged and ran a partial index.
  • You can close this. All is working now, and I think I have a better grasp on how indexing works. Thanks!
Sign In or Register to comment.