Exsessive indexing wordlist
Hi! I am trying to use zotero for my articles library. I wondered about the database size, it is bigger than I expected. Using sqlitebrowser I browsed zotero database and found that it contains fulltext words index. I think that its implementation is not optimal.
For example, indexed words list contains most single letters (a,b,c...), digits (0-9), short words ignored by many search engines (or, and, for, of, we, on, no ...), special chars (celcius degree, copyright symbol) etc. It may seem harmless but imagine that many of them may be found in almost every article. Therefore, for every such almost useless search item database contains the very long list of articles containing this item. In most cases this information may be considered useless.
If we would ignore this digital noise of entering the index engine, zotero database would greatly reduce its size and correspondingly the speed of search would be greatly increased.
Therefore, I suggest to implement such feature in zotero that would allow users to ignore certain specified search items and categories.
For example, indexed words list contains most single letters (a,b,c...), digits (0-9), short words ignored by many search engines (or, and, for, of, we, on, no ...), special chars (celcius degree, copyright symbol) etc. It may seem harmless but imagine that many of them may be found in almost every article. Therefore, for every such almost useless search item database contains the very long list of articles containing this item. In most cases this information may be considered useless.
If we would ignore this digital noise of entering the index engine, zotero database would greatly reduce its size and correspondingly the speed of search would be greatly increased.
Therefore, I suggest to implement such feature in zotero that would allow users to ignore certain specified search items and categories.
-
dstillmanThe current implementation is certainly not optimal. It'll be replaced when we switch to SQLite's built-in FTS (which will probably happen after 2.1), but we might be able to put a basic stoplist in in the meantime. (There's a very old ticket for this.)