pdf full-text indexing

urschrei · July 10, 2007

I've installed Xpdf, and created the symlink to pdftotext-Macintel in my Zotero data directory.

Now what? ie, how can I tell if its working or not?

dstillman · July 10, 2007

Aside from adding a PDF and searching for some text in it, you can also try starting Firefox via the command line with the Zotero debug pref enabled and looking at the first few lines of the debug output.

The next version of Zotero will add an interface for fulltext indexing to make installation and usage must easier and clearer.

urschrei · July 10, 2007

Starting from the command line gives me the following:

loaded md5.js
*** CLB *** Initializing Google Browser Sync...
*** CLB *** Instanciating core objects...
*** CLB *** Registering with XPCOM...
*** CLB *** Adding categories...
*** CLB *** Google Browser Sync initialized succesfully!

No sign of anything Zotero-, or Xpdf-related. Hm.

dstillman · July 10, 2007

Did you enable the Zotero debug pref?

urschrei · July 10, 2007

I did this time...

zotero(3): pdftotext registered at /Users/sth/Library/Application Support/Firefox/Profiles/kpyuc3gw.default/zotero/pdftotext-MacIntel

I take it that's a Good Thing?

dstillman · July 10, 2007

Yup—should be working. Note that in the current version there's no way to reindex existing documents, but the next version will allow you to do so.

urschrei · July 10, 2007

Yep, still, it's nice to have. Thanks very much.

CB · July 10, 2007

While on the topic of pdftotext, there's an odd bit of behaviour with zotero on Linux for me.

If I symlink to the pdftotext binary (in my /usr/bin), it doesn't work, and gets the error: "zotero(3): pdftotext-Linux-i686 symlink target not found -- PDF indexing disabled"

However if I copy the binary directly to my zotero data directory (and rename it to pdftotext-Linux-i686), the error goes away, and indexing happens. Strange.

dstillman · July 17, 2007

If I symlink to the pdftotext binary (in my /usr/bin), it doesn't work, and gets the error: "zotero(3): pdftotext-Linux-i686 symlink target not found -- PDF indexing disabled"

However if I copy the binary directly to my zotero data directory (and rename it to pdftotext-Linux-i686), the error goes away, and indexing happens.

Fixed on the dev branch.

mark · January 23, 2008

The FAQ says:
It does not currently search PDFs or Word documents.
But it does search PDFs if you index them. The FAQ should say that.

Tjowens · January 23, 2008

Thanks Mark, I have updated that text in the FAQ