Better way to handle PDF indexing in large database

cadudesun · March 9, 2014

Hi,

I intend to index around 5000 PDFs. I've started last night with "rebuild index" from zero.

Zotero Firefox started the processes but didn't respond after that (the window title was indicating this).

After the night / 8 hours in which the computer was dedicated just to this task, I had to force the end the task (windows task manager). When I reopened Zotero, there was 565 items indexed. So, it seems even Zotero isn't responding, there is a back processing.

I am wondering about a better way to handle PDF indexing in a large database. Should I every night "rebuild the index of just unindexed items" incrementally? Would there be better way?

Thanks for any advice,
Cadu

adamsmith · March 9, 2014

install the beta version. Indexing is many times faster.
http://www.zotero.org/support/dev_builds#zotero_40_beta
(or wait for Zotero 4.0.18)

cadudesun · March 10, 2014

Thanks for information. Beta version already installed and being tested! By the way, I would appreciate a clarification:

a) Is there a list/history of new improvements in last beta version? Where is it accessible?

b) When indexing PDF, should I always rebuild index for unindexed items, or is it a background process? I mean, if I just keep my computer on the indexing is working? Or does rebuild index for unindexed items create a dedicated task with better performance?

Best!

adamsmith · March 10, 2014

a) no, new beta features aren't documented beyond the github commit log.
https://github.com/zotero/zotero
Once the regular release comes out, the changelog is here: http://www.zotero.org/support/4.0_changelog

b) Just dragging PDFs to Zotero (or linking/storing them otherwise) is all you need to do. There isn't really a reason to use the "rebuild index" option under normal conditions.