How many items can I safely use in a Zotero library?
Hello all,
I'm thinking of using a Zotero library to store records that we're getting from automated literature searching. This might be several 100,000 records (say 0.5m or so). In brief: Is Zotero sufficiently efficient to handle a library that large? How large are your libraries?
In more detail, I am wondering about this: Given that the Zotero client uses an sqllite database, would I be more or less correct to assume that the Zotero database is as efficient as any other (alternative) sql(lite) database we'd build to hold those records?
Many thanks!
Björn
I'm thinking of using a Zotero library to store records that we're getting from automated literature searching. This might be several 100,000 records (say 0.5m or so). In brief: Is Zotero sufficiently efficient to handle a library that large? How large are your libraries?
In more detail, I am wondering about this: Given that the Zotero client uses an sqllite database, would I be more or less correct to assume that the Zotero database is as efficient as any other (alternative) sql(lite) database we'd build to hold those records?
Many thanks!
Björn
It's _possible_ with Zotero 5.0 now to have libraries of that size (e.g. syncing, which used to be a bottleneck, will work) but I don't think it'll be much fun.
Not sure on the 2nd part of the question. dstillman might have a better
For 2, the SQL part is mostly irrelevant — it really just depends on the specific implementation. Something that just displayed paged results from a DB would absolutely be faster and use less memory, because it'd just be making SQL queries with a LIMIT. If something was actually trying to display 500K rows at once, it would run into some of the challenges we face. Performance would depend on questions like whether the program was frozen while retrieving rows (faster, but jarring) or tried to keep the UI responsive (as we do), or whether data was available asynchronously (such that you might see blank sections while scrolling quickly) or immediately (as is the case in Zotero, but only thanks to slow and memory-intensive preloading). With something that was a more lightweight wrapper around a DB, you might also need to do more work upfront to preformat data in various ways you might want to use it, whereas Zotero does some of that on-demand (at the cost of performance, though there are ways that could be improved).
There are definitely some more things we could optimize further for large libraries, and switching to Electron will likely also help, but those things are still a ways off.
Björn
Does that make a performance difference for very large libraries (>100k items)?
My bibliography is an important part of my work so I naturally worry about future-proofing my workflow.
I have experienced performance issues when indexing is activated, so at some point I turned it off (but I have never tried turning it back on). That's why I'm asking about this option in particular.
Thanks.
The main issue is 'jitter' - let me explain. Searching or copying between libraries can be slow. When this is happening (or when there's other 'background' processes running), items don't stay selected and I cannot right-click. E.g., when searching starts, and items come up, and I select one, it will not stay selected.
It would be great if the search was faster, but the main issue for me is the 'jitter' - i.e., having to wait for processes to fininish before I can use the app again.
Given the many improvements in 7, this may well be better there, but I haven't tested yet.