Why are some non-image PDF files not indexed?

wonblee · October 20, 2012

1. Why are some non-image PDF files not indexed?

I was surprised to find some of the non-image PDF files not indexed. (My Zotero has up-to-date PDFTOTEXT and PDFTOINFO installed.)

2. Does it matter whether the PDF was created with embedded fonts or was originally an image file later OCRed, when Zotero decides whether to index or not?

3. Is there a batch function to index all non-image PDF files that have not been indexed for one reason or another? Or do you have to go over each PDF file, check their index status and index them manually?

wonblee · October 30, 2012

Nobody on this?

adamsmith · October 30, 2012

2. no
3. You can re-index all files (under search in the preferences), but there is no option to just re-index unindexed files, no. That said, if you have indexing turned on, in principle the case of a file that doesn't index automatically, but does index manually shouldn't exist.

dstillman · October 30, 2012

but there is no option to just re-index unindexed files

Actually if you click Reindex Items (which should have an ellipsis) it asks if you want to reindex all items or just unindexed ones, but as adamsmith says, that's unlikely to change much unless you've changed the underlying files. If Zotero doesn't index something, it's generally because it can't.

dstillman · October 30, 2012

(Well, that option could help if you're using syncing and some items were indexed on another computer and some were indexed on this computer. Right now the full-text index isn't synced and indexing isn't triggered automatically for synced items.)

wonblee · October 31, 2012

Right now the full-text index isn't synced and indexing isn't triggered automatically for synced items.

I'm a little surprised that index isn't sinced. That means I'll have to duplicate my effort to index unindexed items for each database. granted I have only two - work and home.

dstillman · October 31, 2012

Syncing of the full-text index is planned.