How many items can I safely use in a Zotero library?

bjohas · October 27, 2019

Hello all,

I'm thinking of using a Zotero library to store records that we're getting from automated literature searching. This might be several 100,000 records (say 0.5m or so). In brief: Is Zotero sufficiently efficient to handle a library that large? How large are your libraries?

In more detail, I am wondering about this: Given that the Zotero client uses an sqllite database, would I be more or less correct to assume that the Zotero database is as efficient as any other (alternative) sql(lite) database we'd build to hold those records?

Many thanks!
Björn

adamsmith · October 27, 2019

From what I'm hearing, libraries much over 100k aren't performing particularly well.
It's _possible_ with Zotero 5.0 now to have libraries of that size (e.g. syncing, which used to be a bottleneck, will work) but I don't think it'll be much fun.

Not sure on the 2nd part of the question. dstillman might have a better

dstillman · October 27, 2019

We've heard from a handful of people with 200K or 300K libraries who seemed to be getting by, though I wouldn't expect it to be that much fun. For now, if you were going to attempt it, you would want to run on macOS or Linux, and you'd want the data directory on an SSD. (On Windows the Zotero executable is still 32-bit, which limits memory usage. We're working on a 64-bit version that should be available soon. Based on reports over the years, I'd still be somewhat skeptical of Windows performance, but it's possible that's just because people on Windows have been more likely to be using lower-powered and HDD-based systems than people with Macs.)

For 2, the SQL part is mostly irrelevant — it really just depends on the specific implementation. Something that just displayed paged results from a DB would absolutely be faster and use less memory, because it'd just be making SQL queries with a LIMIT. If something was actually trying to display 500K rows at once, it would run into some of the challenges we face. Performance would depend on questions like whether the program was frozen while retrieving rows (faster, but jarring) or tried to keep the UI responsive (as we do), or whether data was available asynchronously (such that you might see blank sections while scrolling quickly) or immediately (as is the case in Zotero, but only thanks to slow and memory-intensive preloading). With something that was a more lightweight wrapper around a DB, you might also need to do more work upfront to preformat data in various ways you might want to use it, whereas Zotero does some of that on-demand (at the cost of performance, though there are ways that could be improved).

There are definitely some more things we could optimize further for large libraries, and switching to Electron will likely also help, but those things are still a ways off.

bjohas · October 27, 2019

Dear both - that's very helpful, many thanks!
Björn

bwiernik · October 27, 2019

@dstillman If you want to do some performance testing, I have access to some higher end Windows systems.

b0c5 · November 17, 2023

Sorry to revive an old topic, but was curious if there have been updates on this with more recent Zotero versions?

adamsmith · November 17, 2023

Zotero is 64-bit in version 7 and overall things are a bit faster, but the overall order of magnitude is still in line with the above

b0c5 · November 18, 2023

I have disabled indexing (I set "Maximum characters to index per file" and "Maximum pages to index per file" settings to 0). I just usually search by author, title.

Does that make a performance difference for very large libraries (>100k items)?

adamsmith · November 18, 2023

Rather than talking in generalities, could you just say what either the problem you are experiencing or the use case you are interested in is?

b0c5 · November 18, 2023

Ok. I am using Zotero 6 latest stable version. I have a MacBook Air M2. My library has about 20k items (mostly PDF articles, but also some books, thesis, ...) and growing fast, so my question is whether I should worry if I keep adding items.

My bibliography is an important part of my work so I naturally worry about future-proofing my workflow.

I have experienced performance issues when indexing is activated, so at some point I turned it off (but I have never tried turning it back on). That's why I'm asking about this option in particular.

Thanks.

dstillman · November 18, 2023

You should use Zotero normally and report a problem if you actually encounter one.

adamsmith · November 19, 2023

(But Zotero 7 is going to be faster because it runs natively on Silicone and yes, indexing does slow down some operations, but you should really be able to run a 20k library without issues even in 6).

b0c5 · November 19, 2023

Alright thanks. I just wanted to have some idea of what numbers to expect. Looking forward to Zotero 7!

bjohas · November 30, 2023

Just to give another data point. I prob have ~ 100k items across 20 libraries that I load into the desktop. Recent higher-end Dell, running Ubuntu.

The main issue is 'jitter' - let me explain. Searching or copying between libraries can be slow. When this is happening (or when there's other 'background' processes running), items don't stay selected and I cannot right-click. E.g., when searching starts, and items come up, and I select one, it will not stay selected.

It would be great if the search was faster, but the main issue for me is the 'jitter' - i.e., having to wait for processes to fininish before I can use the app again.

b0c5 · December 2, 2023

@bjohas, are you on Zotero 6 or Zotero 7 beta?

bjohas · December 2, 2023

Version 6

Given the many improvements in 7, this may well be better there, but I haven't tested yet.

b0c5 · December 3, 2023

Okay, please do post here if you eventually give Zotero 7 beta a try.