Indexing in Zotero standalone -- limit characters or pages?
I've been using Zotero for several years now. I have traditionally not used the indexing feature -- I have many PDFs, and many of them are books, and I was under the impression that setting Firefox to work indexing them would slow down Firefox.
Now that I'm using the standalone version, I'd like to ask a couple of questions about indexing and performance.
1. Is there a reason why the indexing defaults are set to a limit of 100 pages per file 500,000 characters per file? If Zotero standalone is doing the indexing, should Firefox be able to work relatively speedily? Put another way, is there a performance drawback to setting Zotero to index as high as 350 pages, and perhaps 1 or 2million characters per item?
2. I store all of my PDFs in one folder in My Documents. My Zotero database has links for each entry, rather than the actual PDF. Will indexing from Zotero follow those links and actually index material that is stored in a separate folder, rather than within Zotero's own directory/program location?
3. Along the same lines, all of those PDFs are synced between desktop and laptop using an external syncing agent (Sugarsync), while Zotero itself is synced across computers using Zotero's own sync feature. The directory structures of both computers are identical. Will indexing work in this situation? Or will indexing take place independently on the two Zotero installations, and then cause a conflict when syncing occurs?
Thanks for your guidance.
Now that I'm using the standalone version, I'd like to ask a couple of questions about indexing and performance.
1. Is there a reason why the indexing defaults are set to a limit of 100 pages per file 500,000 characters per file? If Zotero standalone is doing the indexing, should Firefox be able to work relatively speedily? Put another way, is there a performance drawback to setting Zotero to index as high as 350 pages, and perhaps 1 or 2million characters per item?
2. I store all of my PDFs in one folder in My Documents. My Zotero database has links for each entry, rather than the actual PDF. Will indexing from Zotero follow those links and actually index material that is stored in a separate folder, rather than within Zotero's own directory/program location?
3. Along the same lines, all of those PDFs are synced between desktop and laptop using an external syncing agent (Sugarsync), while Zotero itself is synced across computers using Zotero's own sync feature. The directory structures of both computers are identical. Will indexing work in this situation? Or will indexing take place independently on the two Zotero installations, and then cause a conflict when syncing occurs?
Thanks for your guidance.
2. yes
3. indexing will be independent, will work on both computers, and won't cause a sync issue (the index isn't synced).
Indexed: 3921
Partially: 1267
Unindexed: 1618
The statistics have stayed like that for the past hour or so, at least, so that it seems that Zotero has stopped indexing at this stage. I am quite sure that 99% of my PDFs have text layers (i.e., they are not image-only). Is there any thing that can be done to nudge Zotero to continue indexing? It would be nice if Zotero had some sort of indexing status indicator, letting us know that it is still on the job, or that it thinks that it has done all it can...
I just realized I could "only index unindexed items" so I've done that. Still not sure why Zotero stopped indexing, though.
EDIT:
OK, Zotero has definitely stopped indexing, with half of my items either incompletely indexed or unindexed. What to do? Thanks.
"Index Unindexed Items" doesn't currently reindex partially indexed items, so if you changed your settings you'd have to clear the index and rebuild.
The indexing process currently freezes the UI, so there shouldn't be any mystery as to when it's done.
Those items are all (or at least 99%) indexable via Acrobat's own convoluted indexing features, and also are indexable via Windows after I've installed the PDF ifilter. So it seems odd that so many of them would be unindexable by Zotero.
I'd love to provide examples of things that should be indexed but aren't, but it isn't clear to me how to figure out which items have been included in Zotero's index and which items have not. Is it possible to do that?
Finally, regarding "if you've changed your settings you'd have to clear the index and rebuild" -- are there settings that I might be able to change in order to more thoroughly index items. Or were you referring to the limits on characters and pages? Currently, I am indexing 1.5 million characters max per document, and 300 pages max per document. I doubt that of my 5000 or so items there are as many as 1267 that exceed those limits. Are there other settings I might change?
Thanks.
Note that manual indexing in the right-hand pane does index a partially indexed attachment completely.
I am finding that perhaps Zotero is not up to the task of indexing a large library....?
We're not aware of any issues that would cause Zotero to hang during a large indexing attempt, but unfortunately you're not on a platform that allows us to debug this.