(auto-) index
Hi,
I am confused about the indexing features of zotero and the relationship of the bwiernik plugin called "auto-index" and the zotero indexing feature.
I recently realized that in my settings "search" tab, around half of my items appear as not indexed. I began to search on zotero and found a) the mentioned plugin, but b) a forum item https://forums.zotero.org/discussion/comment/239221#Comment_239221 where a user is advised to uninstall said plugin to make indexing work.
Otherwise, there is an absence of documentation of what this plugin is needed. So I am confused as to:
a) why are my items only partially indexed
b) would the plugin solve the issue or rather the opposite?
thanks for help
I am confused about the indexing features of zotero and the relationship of the bwiernik plugin called "auto-index" and the zotero indexing feature.
I recently realized that in my settings "search" tab, around half of my items appear as not indexed. I began to search on zotero and found a) the mentioned plugin, but b) a forum item https://forums.zotero.org/discussion/comment/239221#Comment_239221 where a user is advised to uninstall said plugin to make indexing work.
Otherwise, there is an absence of documentation of what this plugin is needed. So I am confused as to:
a) why are my items only partially indexed
b) would the plugin solve the issue or rather the opposite?
thanks for help
Also confusing: on the search support page mentioned above, there is mention that items can only be indexed if they contain searchable text, which obviously makes sense. But the settings do not make clear whether the "unindexed" items are or include those that cannot be indexed because they simply do not contain any searchable text.
In my specific case, I thought I might figure this out from the count, but the count does not add up: My library contains 11000ish items, of which my settings count says 3700ish are indexed, 4600 unindexed and 234 partially indexed. Which altogether means that around 2000 are missing from the count altogether. is the answer that the 2000 missing are the ones that do not contain searchable text, that 3700 could be indexed and that for 4600 the indexing failed? Or is it that 3700 could be indexed, 4600 do not contain searchable text and 2000ish failed? Or something else?
Another issue: It is not clear to me:
a) why the max characters per item and the max pages are set so low? Is there any reason for this? clearly 100 pages is less than most books, so for people with books as pdf this does not make sense? Given that most people will never see this, this seems problematic?
b) the max character per item (500 000=200 pages) and the max pages (100) default seem to be very different: Why is this? And which one has precedence? I.e. if a book is say 450 000 characters on 160 pages, will it stop indexing at 100 pages? Or not index at all? Or is this the reason why items appear as "partially indexed"?
Given the above, would it not make sense to first of all explain in more detail how this works and second, and more importantly, to set the max values higher and that they roughly match each other, say the max characters to 1 mio, and the max pages at 400? Or is there any reason for these low numbers?
Finally, it is unclear what happens if I increase the max character/pages numbers. I thought that if I increase the numbers then maybe it will index the unindexed items because they were too long. But nothing seems to happen. Maybe its a bug, or maybe this is how it is indended, and zotero simply indexes the rest of the text of already indexed items in the background? Or do I need to rebuild the entire index from scratch to have the rest of already indexed items indexed?
Again, some help/explanation would be good. And apologies for the long posts, but I thought others might be similarly confused.
I should also add that my library index has moved now to 57 partial, 1100 not indexed, and I have no idea why the partially indexed have gone down.
Files added via the mobile apps or the web library are not currently indexed automatically, so those would remain unindexed on other computers unless you index them. That's obviously something we need to fix.
"Partial" indexing means that the content was longer than the max page/character settings. A manual reindex of the file would cause it to be fully indexed. The default settings were set many years ago and haven't been adjusted since — we could probably do so now given computer performance increases, but it would ideally be done as part of a larger technical overhaul of the indexing system, which isn't on the immediate horizon.
If you're experiencing some problem that you don't think is described by all that, you should report it in a new thread.
As to why that plugin exists, you'd have to ask @emilianoeheyns what problem it was trying to solve. For now, I've removed it from the plugins page to avoid confusion (and because it hasn't been updated in years and wouldn't be compatible with Zotero 7 anyway).
And no, it isn't zotero 7 compatible.
Thanks for the heads up :-)
Just wondering what the consequences would be on my performace if I pushed the defaults up high enough to automatically index everything. Am I only going to see increased resource usage when I actually use advanced search to search the contents of all attachments, or will it also effect general performance even when the indexes aren't being queried?