Problem with indexing PDF (debug id: D1427845864)


Debug ID: D1427845864
OS: Mac OS X 10.5.8
Browser: Firefox 3.6.3
Plug-ins: Zotero 2.0.2
PDF indexing: pdftotext 3.0.2 + pdfinfo 3.0.2


I have a problem with indexing a certain PDF document, a book with about 340 pages - it just doesn't work. I've already searched the forum for any help, but nothing helped so far.

Error message:

(3)(+0000010): Running pdftotext -enc UTF-8 -nopgbrk -l 100 "/Users/sh/Documents/Zotero/storage/8NXZDSV4/Name of my book.pdf" "/Users/sh/Documents/Zotero/storage/8NXZDSV4/.zotero-ft-cache"

(2)(+0000009): Name of my book.pdf was not indexed


Here are a few remarks:

  • Yes, the PDF contains real text and the text is selectable in Acrobat. Yes, you can also copy/paste the text from Acrobat to any other application. There are no security restrictions with the PDF.

  • I've already tried to remove the blanks from the filename and try again. Same result - no indexed PDF.

  • I've already tried to increase max. characters/pages to 2000000/400. Same result - no indexed PDF.

  • I've executed the command on the terminal - that worked flawlessly! The corresponding file .zotero-ft-cache contains the desired parts of the text.

  • When I installed Zotero I used the installtion routine within this add-on to install pdftotext/pdfinfo. Since this didn't work, I installed the most recent version from here. Might this be an issue?

Thanks a lot in advance for your help!
Sven
  • When I installed Zotero I used the installtion routine within this add-on to install pdftotext/pdfinfo. Since this didn't work, I installed the most recent version from here. Might this be an issue?
    Zotero uses custom versions. The official versions won't work.
  • Dan, thanks for your reply!

    What do you think might help now? I don't know whether Zotero currently makes use of the custom version or the official one.
  • Delete the pdf* files in the Zotero data directory, restart Firefox, and reinstall through the prefs.
  • Thank you very much, Dan. That actually did the job!
    I guess that something must have gone wrong with the initial installation of pdftotext/pdfinfo.
  • Hi,
    I have the same problem but reinstalling the pdf* files didn't help.
    May it be a problem that I use the most recent version of Ubuntu?
    (Ubuntu 10.04 RC and Firefox 3.6.3)
    I'm not sure since when indexing doesn't work anymore. Maybe it's broken since I installed the new Ubuntu.

    What can I do?

    Thanks,
    Tobi
  • have you tried installing or updating the pdf-plugin from the search tab of the Zotero preferences?
  • twk
    edited April 23, 2010
    Hi, that's what I did.
    First, I tried to update, then I deleted the pdf*-files and reinstalled them by the button in the search tab ob the Zotero preferences.
    Still manual indexing and indexing of unindexed items doesn't work.
    A complete reindex starts but it will break somewhen at 1333 indexed and 70 partial indexed files. The 433 files left will not be indexed even when I choose to index the unindexed files.
  • and you're sure those 400 have a text layer?
  • I see, that there are pdfs with text layer that are not indexed and I cannot index them manually
  • In fact I cannot index any pdf manually anymore.
  • A complete reindex starts but it will break somewhen at 1333 indexed and 70 partial indexed files.
    Break meaning what?

    Provide a Debug ID for manual reindexing of a PDF that doesn't work.
  • Hi, my debug-id is 472125780
  • twk
    edited April 23, 2010
    The cached data in /media/daten/zotero_bibliothek/storage/5INFECVQ/.zotero-ft-cache is there and I can read the text of my pdf but still it doesn't say it is indexed. Also it doesn't find the pdf when I search Zotero for indexed words.

    Breaking means, that Firefox says that the script doesn't respond anymore.
    If I click "continue" it will break again in a view moments.
  • Debug ID, not Report ID. Follow my link.
  • And that's not a valid Report ID anyway. Copy and paste. Don't retype.
  • Breaking means, that Firefox says that the script doesn't respond anymore.
    If I click "continue" it will break again in a view moments.
    OK, that's not breaking. Don't bother with the Debug ID.

    http://www.zotero.org/support/kb/unresponsive_script_warning
  • Hi,
    I see - thanks for the link.
    But still it should index my pdf when I manually click on the green arrows - at least, when I restart Firefox, right?
  • I did another one with just one index-trial. The following is copied from the message window:
    The Debug ID is D54287521.
  • Does nobody have a solution for my problem?
    Zotero still doesn't index any PDF.
    The text is read and written to a file in the same folder as the PDF but zotero doesn't recognize it as indexed.
    What's the problem? I highly miss this feature!

    Tobi
  • Hi,
    I tried several approaches to solve the problem and tried indexing lots of pdfs and it still doesn't work. Please help me - I've got several hundreds of PDFs and need the index badly.
    Thanks,
    twk
  • Is this a network drive? Try moving your data directory to your main local hard drive, as there's a chance this is due to a data flushing issue with your drive/OS. (Basically, the pdftotext process is completing, but when Zotero then goes to check for the cache file's existence, it's not there.)
  • Hi,
    thanks for your hint.
    It had not been a network drive but there had been problems with the way of mounting it.
    With your hint, I could find the problem very quick.

    Thanks again,
    Tobi
  • It had not been a network drive but there had been problems with the way of mounting it.
    If this is something others might encounter, it might be helpful to elaborate.

    Either way, glad to hear it's fixed.
  • Hello everyone.

    I was trawling through the forums to find the solution to this problem but still it is not clear to me how this is fixed.

    I use Ubuntu 10.10 and had problem with indexing. I tried everything and almost gave up on the feature altogether.

    I made a last attempt.

    Following the hints here I checked whether my Zotero would index once I revert to the default location for its library. And it did.

    It turnes out there must be something wrong with the location on my other partition where I normally keep my library. It's not a network drive but just a partition on my HDD.

    I am glad that the indexing is working again however, I would want to keep the folder where it is as it is more secure on a partition that is not a system one.

    I would be grateful for your help. Does anyone know a workaround to this.

    Thanks,

    Zym
  • edited October 15, 2011
    Hello. I've not been using zotero for some time and as much as I like it to collect data I would love if I could perform searches on my pdfs wherever they are stored. After all this is a great feature I don't see why it shouldn't work flawlessly for folks like me who use Linux/Ubuntu.

    'twk' would you be able to tell me what you did exactly , so that I can try and follow your instructions and see what might be the issue here.

    Thanks
    Zym
  • This is due to a mozilla bug, listed at the top here:
    https://www.zotero.org/trac/wiki/Mozilla_bugs_affecting_Zotero?version=33

    You get significantly less crashes using the new Ooo/Lo plugin (which doesn't rely on FF java)
  • This isn't about crashes and the bug that you referred me to does not apply here. I do have the latest Libre Office plugin installed and Firefox works without any problems (running Java or not).

    The indexing just doesn't work.

    I think that the issue here is the location of the zotero library but I couldn't find a workaround for this.

    If anyone could help I would be really grateful.

    cheers
  • @Dan Stillman:

    On Apr 4 2010, Dan said, "Zotero uses custom versions. The official versions won't work."

    I didn't realize a custom version of the pdf tools was being used, and for months I've been struggling with pdf indexing because I manually linked to the stock Xpdf tools in /usr/local/bin.

    I would like to suggest changing the reported versions of these tools to include "-zotero" in the scheme. For example, instead of saying "pdftotext 3.02 is installed" report as "pdftotext 3.02-zotero is installed".

    It might also be useful to mention this idiosyncrasy in the preferences, because it wasn't obvious that Zotero's pdftotext 3.02 was different from my pdftotext 3.02.

    At any rate, I now have nearly my entire library indexed, and I'm really happy with that.
Sign In or Register to comment.