PDF Indexing in halted in FF6 (Report ID 1065217827)

I am running Ubuntu 10.10, and have installed the PPA for the stable channel of Firefox (currently FF6). However, I noticed that my PDF indexing stopped working after the last stable upgrade.

I am aware that the PDF indexing honors the security on PDFs -- as such, the PDF in the Report ID 1065217827 was double checked to be sure that it is completely open for content to be copied, etc. However, I also noticed that Zotero appears to be unable to get to a required file (the FF6 chrome manifest?).

I'm sure that this error has something to do with pointing zotero to the some correct file location or perhaps with permissions for that same file. However, I'm relatively new to Linux -- so would appreciate some specific instructions (e.g., command line) on how to reconnect everything to make PDF indexing operational again.

I am up against a deadline, so assistance will be extremely welcome!
  • Can't really tell you why it's not indexing that file, but the most common causes are security on the PDF and extended characters in the name. You can run Firefox from the console and see if there's an error from pdftotext and also try running the given command line manually.
    However, I also noticed that Zotero appears to be unable to get to a required file (the FF6 chrome manifest?).
    That message doesn't have anything to do with Zotero.
  • Also, are you sure this is for all files, or just some?
  • The reason I thought it was having trouble getting to a required file was that the initial error in Zotero reads:

    Could not read chrome manifest file '/usr/lib/firefox-6.0/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}/chrome.manifest'.

    This is then followed by a number of PDF indexing errors, such as:

    [JavaScript Error: "Iivari et al. - 1998 - A Paradigmatic Analysis Contrasting Information Sy.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 486}]

    When I navigated to the /user/lib/, I then found both a firefox and firefox-6.0 directory. So I thought that perhaps Zotero was looking in /firefox, when it should have been loking in /firefox-6.0.
  • As to your last post, indexing has not occurred for any files I added to Zotero since the FF6 upgrade. It just provides a similar error to the JavaScript error above.
  • Just ignore that message. It doesn't have anything to do with Zotero or your issue.
  • Thinking about the non-standard characters, I tried a different PDF that only used text and dashes, and it is experiencing the same Zotero PDF indexing error:

    [JavaScript Error: "Hirschheim and Klein - 1989 - Four paradigms of information systems development.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 486}]

    What would "Line: 486" be complaining about?

    I tried uninstalling Zotero and reinstalling, but with no joy.
  • It's just looking for the cache file generated by pdftotext, and the cache file is missing. Run the command from the debug output, as I suggested above.

    (I forgot that extended characters actually now result in a more specific error message, so it's not that.)
  • Debug ID is D1731738616.

    A cut from the file looks like its finding the tools -- however, its not doing anything with them:

    (5)(+0000001): SELECT charsetID FROM itemAttachments WHERE itemID=?

    (5)(+0000000): Binding parameter 1 of type int: 884

    (3)(+0000000): Running pdfinfo "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/Hirschheim and Klein - 1989 - Four paradigms of information systems development.pdf" "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/.zotero-ft-info"

    (3)(+0000012): Running pdftotext -enc UTF-8 -nopgbrk "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/Hirschheim and Klein - 1989 - Four paradigms of information systems development.pdf" "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/.zotero-ft-cache"

    (2)(+0000011): Hirschheim and Klein - 1989 - Four paradigms of information systems development.pdf was not indexed

    (5)(+0000001): Committing transaction

    (3)(+0000000): Resetting Notifier event queue

    (5)(+0000001): SELECT indexedPages, totalPages AS total FROM fulltextItems WHERE itemID=?

    (5)(+0000000): Binding parameter 1 of type int: 884

    (5)(+0000001): SELECT indexedPages, totalPages AS total FROM fulltextItems WHERE itemID=?

    (5)(+0000000): Binding parameter 1 of type int: 884
  • I had also transferred my complete Zotero library from an older laptop (running Ubuntu 10.04) to this newer one (running Ubuntu 10.10). I did this by just copying over the entire .mozilla directory (which included all of the Zotero files as well as the PDF index). The PDF content search is working fine for all of the older PDF files that were indexed on the old machine, but, as I said, since updating to FF6, no new PDFs have been indexed. Do you suppose there's a permissions problem in there somewhere -- e.g., a read-only index?
  • edited August 20, 2011
    Run the command from the debug output, as I suggested above.
    I'm saying to run the pdftotext command manually to see if the index file gets created or if it displays an error.
  • Sorry, I wasn't sure what you meant.

    I've copied the command, and ran the following command from the console:

    pdftotext -enc UTF-8 -nopgbrk "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/Hirschheim and Klein - 1989 - Four paradigms of information systems development.pdf" "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/.zotero-ft-cache"

    Interestingly, Zotero has now changed the "Indexed: No" to now be "Indexed: unknown".

    I then tested the search by using an unlikely chunk of text from the first page: "rhetorical vehicle used for explicating the paradigms" -- and the correct paper showed up as the only hit. Excellent! (and rather unsurprising, since that is terrible prose!)

    I did a similar check for a chunk of text near the end -- also a hit.

    So, it appears to be indexing...

    So, why do you suppose that this is working from the console, but not from within Zotero?
  • I see a pattern forming. Each properly indexed record appears to have two files associated with it in the zotero storage area.

    .zotero-ft-info
    .zotero-ft-cache

    These are missing in all my new PDFs. When I ran the pdftotext command, it created the .zotero-ft-cache.

    However, when I ran the pdfinfo command:

    pdfinfo "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/Hirschheim and Klein - 1989 - Four paradigms of information systems development.pdf" "/home/phd/.mozilla/firefox/f69cacri.default/zotero/storage/TV9GQT62/.zotero-ft-info"

    ...the .zotero-ft-info file was NOT created.

    Could this be the cause of the Indexed: Unknown status?
  • Yes. I can't really tell you why it's not working from Firefox, though. If I had to guess, I'd say SELinux or some other security software on your system preventing Firefox from running other programs.
  • Also, when running the pdfinfo command, it appears to be missing an an operator. From the console, this is the output after running the above command:

    pdfinfo version 0.14.3
    Copyright 2005-2010 The Poppler Developers - http://poppler.freedesktop.org
    Copyright 1996-2004 Glyph & Cog, LLC
    Usage: pdfinfo [options] <PDF-file>
    -f <int> : first page to convert
    -l <int> : last page to convert
    -box : print the page bounding boxes
    -meta : print the document metadata (XML)
    -enc <string> : output text encoding name
    -listenc : list available encodings
    -opw <string> : owner password (for encrypted files)
    -upw <string> : user password (for encrypted files)
    -v : print copyright and version info
    -h : print usage information
    -help : print usage information
    --help : print usage information
    -? : print usage information

    This would indicate to me that the zotero command is missing an option operator (e.g., -v). Thoughts?
  • Zotero runs it fine. You're just running the version from your path rather than the patched Zotero version.

    I'm afraid I can't help you beyond this. You'll have to figure out what on your system is preventing Firefox from running the executables. There's no reason to think this is an issue with Zotero itself.
  • Other than SELinux (which I am not currently aware of) -- do you have any other possible leads that you would suggest to explore?

    As I said, I'm getting more proficient, but I am basically a user, not a programmer or a Linux expert.

    And if I do find a solution, I will post it here in case others have a similar problem.
  • Sorry, I don't, but SELinux would be the first to look for. Check your system logs.
  • Apparmor is probably the relevant security software.
    https://help.ubuntu.com/community/AppArmor
  • In the end, I simply deleted the PDF-related files from the zotero folder within .mozilla. This included manually deleting the following (actually, I simply moved them into a different folder where Zotero wouldn't see them):

    pdfinfo-Linux-i686
    pdfinfo-Linux-i686.version
    pdftotext-Linux-i686
    pdftotext-Linux-i686.version

    Under Zotero Preferences --> Search, I then asked Zotero's PDF Indexing to check for updates for the above files. Unsurprisingly, Zotero found that they were not installed (as I had just deleted the files). Zotero kindly volunteered to install the correct files -- which it did -- and then happily caught up on its delayed indexing tasks.

    Problem solved.

    Be sure to try the obvious steps before venturing into land of AppArmor and SELinux!!

    Thanks for everyone's efforts!
This discussion has been closed.