Could not read text from pdf error - with all pdfs

Hi folks,

I am trying to add references for a number of downloaded pdf's. In each case, I add the file to the library with "Link to File..." try "Retrieve Meta-Data for PDF." The pop-up dialogue then tells me "Could not read text from pdf". The info window for the reference indicates that it is not indexed. My preferences tell me that I have pdf2text 3.02 installed (Mac Snow Leopard). Rebuilding the index does not help. If I open the pdf in Preview, I can select, copy, and paste the text suggesting that the full text is in the pdf. There aren't any odd characters in the pdf file name (suggested elsewhere in the forum).

Any advice? Here's a downloadable copy of the pdf from my dropbox, to help debug/diagnose this.

http://dl.dropbox.com/u/245354/Goff%202008.pdf

Any help would be appreciated tremendously!

Henry
  • Strange, I downloaded your file and retrieve metadata from PDF worked just fine for me. It even got the right article....

    Goff, D. C., Lamberti, J. S., Leon, A. C., Green, M. F., Miller, A. L., Patel, J., Manschreck, T., et al. (2007). A placebo-controlled add-on trial of the ampakine, CX516, for cognitive deficits in schizophrenia. Neuropsychopharmacology, 33(3), 465–472.

    Not sure if I have any idea of what is up, but a little more info might help.

    On the linked .pdf file does Zotero tell you it is indexed in the right column? Does it have a page count?

    What do the index statistics on the search pref pane look like (#indexed, #partial, #unindexed, and #words?)
  • @Tjowens - thanks for running that check on the file!

    On my side, the pdf is definitely not getting indexed - "Indexed: No" on the right. Clicking the reindex button doesn't cause it to be indexed; and in preferences if I clear the index and force a re-index, it still isn't indexed.

    Zotero error report notes:
    [JavaScript Error: "Goff 2008.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]
    Report ID: 890701973

    So perhaps this is a pdftotext problem; however I installed this directly from zotero. I'm on Snow Leopard - I wonder if there is another pdftotext that is causing a conflict?
  • If you generate real-time debug output you may be able to see what's going on. I believe you can actually skip Step 1—turning on the debug pref in Zotero—and just start Firefox from Terminal, since pdftotext should generate its own output separately from Zotero. Let us know what appears when you try to manually reindex an unindexed file from the metadata pane (not the prefs).
  • But one possibility is that those files have copying—or whatever that flag is in PDFs—disabled. If so, pdftotext can't access the text.
  • OK, with debug logging on, I see that at startup the pdf utilities are detected:


    (3)(+0000003): pdftotext version 3.02 registered at /Users/henry/Library/Application Support/Firefox/Profiles/ct405b3k.default/zotero/pdftotext-MacIntel

    (3)(+0000000): pdfinfo version 3.02 registered at /Users/henry/Library/Application Support/Firefox/Profiles/ct405b3k.default/zotero/pdfinfo-MacIntel


    However, after I click to index that pdf


    (3)(+0000001): Running pdfinfo "/Users/henry/Library/Application Support/Firefox/Profiles/ct405b3k.default/zotero/storage/24572ZW5/Goff 2008.pdf" "/Users/henry/Library/Application Support/Firefox/Profiles/ct405b3k.default/zotero/storage/24572ZW5/.zotero-ft-info"

    (3)(+0000024): Running pdftotext -enc UTF-8 -nopgbrk "/Users/henry/Library/Application Support/Firefox/Profiles/ct405b3k.default/zotero/storage/24572ZW5/Goff 2008.pdf" "/Users/henry/Library/Application Support/Firefox/Profiles/ct405b3k.default/zotero/storage/24572ZW5/.zotero-ft-cache"

    (2)(+0000007): Goff 2008.pdf was not indexed

    Thoughts??
  • Is that Terminal output, or is that from the debug output logging in the Zotero prefs? I linked you to the Terminal output specifically.
  • OK, I deleted the four pdf* files from the zotero directory; then reinstalled from zotero; then cleared the index and rebuilt it.

    Now everything works as expected - I can add a pdf; index it; and retrieve the meta-data.

    Very odd - however, I'm happy now.
  • Hi Dan - our comments crossed in the ether, and I hadn't read yours yet as I was posting those last ones.

    The problem is resolved by re-installing as I mention above - if there's anything helpful I can do to document the problem let me know - I'd be happy to do so. However, since it's working now I probably can't replicate the error.

    The origin of it is puzzling though, as I installed the pdf* through Zotero in the first place.
  • Also Dan - I just want to say thanks for your work on Zotero - I'm incredibly impressed by Zotero - it fits the way I find references perfectly, and my first experiences with the word processor integration were flawless.
  • Great. Glad to hear everything's working for you now.
  • I have stumbled on this topic as I was experiencing the same problem.

    I am running Zotero on a Linux machine. Tried to follow your path to the solution of the problem but the explanation of the steps # hmahncke gave were not clear to me. What does 'reinstalled from zotero' mean?

    Would be really grateful if you could spell it out more precisely so I can try and solve the same problem, thanks.

    Zym
  • Hi zygmurti

    On Mac OS X, I can click on the zotero icon in firefox, which brings up the zotero window; then open preferences from the gear menu; then go to the Search pane; then install PDF indexing from there. My understanding is that this installs the zotero specific versions of pdftotext and pdfinfo. I assume it works this way in linux as well, but I don't have the linux version.

    Best regards,
    Henry
  • The menu items are identical in my Ubuntu. And yet it does not index all the files even if I try to reindex a perfectly well OCR-ed file. I don't know how and whether reinstalling pdftotext will do the trick. How does one get rid of zotero completely?

    I tried to uninstall zotero but to no avail. It seems that nothing changed after I installed it again.

    Thanks,

    Zym
  • I don't know about reinstalling zotero. And I haven't tried this with OCRd files - all of mine have been downloaded, not scanned.

    But did you delete the four pdf* files from the zotero directory; then reinstall the pdf tools from zotero; then clear the index and rebuild it as I described above? Your comment says you "don't know how and whether reinstalling pdftotext will do the trick." Have you tried it?

    Henry
  • Hello again

    I did follow your instructions to a t and have to say that this did not yield any satisfactory results.

    I even tried to reinstall my whole firefox and zotero installations, to no avail

    When I enabled debugging mode in zotero it gave me something of 18000 lines of output.

    Everything else works fine but this is the one zotero feature I was looking out for a long time.

    Hope someone here can help me.

    I can provide the output file if anyone knows how to make any sense of it.

    Thanks

    Zym
  • edited October 21, 2010
    Hi everybody,

    Sorry to bother but I have a similar problem and am at a loss trying to solve it. I can't index files nor retrieve meta-data.

    I tried to migrate from Windows (XP) to Linux (Mint KDE) so I copied my entire data directory, the same way I've successfully done it in the past from a different Windows version to another. I used FEBE and OPIE to restore my add-ons and preferences so I thought it was the culprit. I deleted everything, including what was related to mozilla in usr/lib and /home. But to no avail.

    After, I suspected that it may be caused by the duplicate files of pdftotext and pdfinfo since the win32.exe were already included in the Zotero directory and Zotero asked me to reinstall the Linux versions. To be sure, I deleted everything once again and started from scratch, deleting the windows files before reinstalling both Zotero and the pdf tools from the preferences pane. Didn't work neither.

    But... If I import files (save copy of file) in the default folder (I forgot to tell that I had set up a personalized one in "documents") with the pdf tools installed there, everything works fine. So I figured out there could be some problem with the path used. I deleted everything in the default folder and replaced it with my old database, excepted pdfinfo and pdftotext, which I reinstalled once again. Didn't work. In fact, only the default document (the start up guide) shows up, meaning that it doesn't read the database (which it did when I was using the personalized folder).

    Of course, I could copy all my files frome the "save copy" command, but that would take quite a while and I'd love all the meta-data already retrieved.

    If somebody could give me a hint, it would be really appreciated. Thanks in advance!

    I probably could sync it but my bandwidth wouldn't really allow such a big upload and subsequent download...
  • Hi zygmurti,
    I encountered your identical problem and finally got to this thread.
    I followed @hmahncke's instruction (thank you, hmahncke!) and everything just worked like a charm!!
    First you need to delete the 4 pdf* files (pdfinfo-Linux-i686, pdftotext-Linux-i686, pdfinfo-Linux-i686.version, pdftotext-Linux-i686.version) in zotero directory, for me it is /home/luzerno/.mozilla/firefox/fgovp8a2.default/zotero
    Restart Firefox, ta-da!

    Thank you again, hmahncke!
    Hope this will help.
  • BTW, I'm using Ubuntu Oneric, Zotero 3.0.1 with Firefox 10.0
  • And two and a half years later... deleting the pdftotext and pdfinfo programs and reinstalling them through the preference menu worked for me as well (Zotero stand-alone on MacOS X 10.8.5).

    Thanks!
Sign In or Register to comment.