PDF indexing not working

Hello all,

I just installed zotero standalone 4.0.26.2 on xubuntu 14.10 64bit and have problems with pdf indexing.
pdfinfo and pdftotext were already installed on my system via poppler-utils, but as recommended everywhere I installed the files locally from the zotero preferences. The menu says that both files are installed with version 3.02.
However, indexing (and metadata retrieval) doesn't work. My library consists of 1 pdf (for testing).

The output of the debug log is:

[JavaScript Error: "Annu. Rev. Fluid Mech. 2015 Vella.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 515}]

I tried to reinstall zotero several times, don't use any addons, deleted and reinstalled the pdfinfo & pdftotext files in the correct path (with zotero preferences), checked that both files are executable.

I'd be glad about any hints, thanks for your help.

Cheers
  • we're talking about this one, right?
    http://www.annualreviews.org/doi/pdf/10.1146/annurev-fluid-010814-014627

    Could you actually submit&post a debug ID for the entire process?
    http://www.zotero.org/support/debug_output
  • Yes, that's the one.

    Debug ID: D1791558551

    What I did:
    - fresh install
    - install pdfinfo and pdftotext
    - enable debug logs
    - add the paper per drag&drop
  • I repeatedly restarted zotero and clicked "Reindex" or "Retrieve metadata".

    The debug output actually differs now. Upon reindexing I get
    "[JavaScript Error: "bad script XDR magic number"]"
    ID: D842235218

    whereas retrieving metadata produced
    ID: D311573746

    the "bad script XDR magic number" error seems to occurr always now, but had not been there in the beginning (did nothing but restart & index/retrieve data from that one file repeatedly)
  • no need to keep re-installing, it pretty much never helps and just wastes your time.
    Don't worry about
    "bad script XDR magic number"
    , that's an inrrelevant error.

    @Dan - anything helpful in the debug?
  • When you say you installed pdfinfo and pdftotext, do you mean via the Zotero search preferences, or some other way (e.g., by symlinking it to an existing binary on your system)?

    Nothing helpful in the debug output, but you can try running the pdftotext command listed there from the command line, being sure to use the path to the pdftotext binary in your Zotero data directory.

    (I didn't try the PDF myself, though.)
  • I installed the files locally from the zotero preferences.
    so that looks fine.
    I did test the version of the file that I link to and that worked for me.
  • I tried to run the commands from the log.

    pdfinfo and pdftotext are called, but these are in fact the ones I had installed from poppler-utils in /usr/bin/.

    pdfinfo "file.pdf" ".zotero-ft-info" (shortened) fails:

    pdfinfo version 0.26.5
    Copyright 2005-2014 The Poppler Developers - http://poppler.freedesktop.org
    Copyright 1996-2011 Glyph & Cog, LLC
    Usage: pdfinfo [options] <PDF-file>
    -f <int> : first page to convert
    -l <int> : last page to convert
    -box : print the page bounding boxes
    -meta : print the document metadata (XML)
    -js : print all JavaScript in the PDF
    -rawdates : print the undecoded date strings directly from the PDF file
    -enc <string> : output text encoding name
    -listenc : list available encodings
    -opw <string> : owner password (for encrypted files)
    -upw <string> : user password (for encrypted files)
    -v : print copyright and version info
    -h : print usage information
    -help : print usage information
    --help : print usage information
    -? : print usage information

    pdftotext seems to work and creates a plain text file.

    How to I get zotero to use the files it installed? In the zotero folder I have
    pdfinfo-Linux-x86_64
    pdfinfo-Linux-x86_64.version
    pdftotext-Linux-x86_64
    pdftotext-Linux-x86_64.version

    If I try to execute those (if that's what should happen?) I get:
    $ ./pdftotext-Linux-x86_64
    bash: ./pdftotext-Linux-x86_64: cannot execute binary file: Exec format error
  • Just running pdfinfo/pdftotext will call the ones in your path. That's irrelevant, and not what Zotero is doing. (That's why I said to use the full path to the binaries in the data directory.)
    $ ./pdftotext-Linux-x86_64
    bash: ./pdftotext-Linux-x86_64: cannot execute binary file: Exec format error
    That would be the relevant error. What's the output of uname -a?
  • And I guess, while we're at it, what's the output of 'file pdftotext-Linux-x86_64'?
  • ah of course, sorry.

    $ uname -a
    Linux hactar 3.16.0-33-generic #44-Ubuntu SMP Thu Mar 12 12:19:35 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  • pdftotext-Linux-x86_64: gzip compressed data, from Unix
  • ... which explains it, of course.

    renaming the files to have a .gz suffix and calling gunzip did it!
    now it seems to work.

    thanks a lot! :)
  • Ah, looks like Zotero wasn't properly handling gzip encoding for those downloads, which we recently enabled. We'll fix that for a future version, but for now I've disabled gzip encoding for those. Thanks for helping to debug this.
Sign In or Register to comment.