How many items can I safely use in a Zotero library?

Hello all,

I'm thinking of using a Zotero library to store records that we're getting from automated literature searching. This might be several 100,000 records (say 0.5m or so). In brief: Is Zotero sufficiently efficient to handle a library that large? How large are your libraries?

In more detail, I am wondering about this: Given that the Zotero client uses an sqllite database, would I be more or less correct to assume that the Zotero database is as efficient as any other (alternative) sql(lite) database we'd build to hold those records?

Many thanks!
Björn

  • From what I'm hearing, libraries much over 100k aren't performing particularly well.
    It's _possible_ with Zotero 5.0 now to have libraries of that size (e.g. syncing, which used to be a bottleneck, will work) but I don't think it'll be much fun.

    Not sure on the 2nd part of the question. dstillman might have a better
  • We've heard from a handful of people with 200K or 300K libraries who seemed to be getting by, though I wouldn't expect it to be that much fun. For now, if you were going to attempt it, you would want to run on macOS or Linux, and you'd want the data directory on an SSD. (On Windows the Zotero executable is still 32-bit, which limits memory usage. We're working on a 64-bit version that should be available soon. Based on reports over the years, I'd still be somewhat skeptical of Windows performance, but it's possible that's just because people on Windows have been more likely to be using lower-powered and HDD-based systems than people with Macs.)

    For 2, the SQL part is mostly irrelevant — it really just depends on the specific implementation. Something that just displayed paged results from a DB would absolutely be faster and use less memory, because it'd just be making SQL queries with a LIMIT. If something was actually trying to display 500K rows at once, it would run into some of the challenges we face. Performance would depend on questions like whether the program was frozen while retrieving rows (faster, but jarring) or tried to keep the UI responsive (as we do), or whether data was available asynchronously (such that you might see blank sections while scrolling quickly) or immediately (as is the case in Zotero, but only thanks to slow and memory-intensive preloading). With something that was a more lightweight wrapper around a DB, you might also need to do more work upfront to preformat data in various ways you might want to use it, whereas Zotero does some of that on-demand (at the cost of performance, though there are ways that could be improved).

    There are definitely some more things we could optimize further for large libraries, and switching to Electron will likely also help, but those things are still a ways off.
  • Dear both - that's very helpful, many thanks!
    Björn
  • @dstillman If you want to do some performance testing, I have access to some higher end Windows systems.
  • Sorry to revive an old topic, but was curious if there have been updates on this with more recent Zotero versions?
  • Zotero is 64-bit in version 7 and overall things are a bit faster, but the overall order of magnitude is still in line with the above
  • I have disabled indexing (I set "Maximum characters to index per file" and "Maximum pages to index per file" settings to 0). I just usually search by author, title.

    Does that make a performance difference for very large libraries (>100k items)?
  • Rather than talking in generalities, could you just say what either the problem you are experiencing or the use case you are interested in is?
  • Ok. I am using Zotero 6 latest stable version. I have a MacBook Air M2. My library has about 20k items (mostly PDF articles, but also some books, thesis, ...) and growing fast, so my question is whether I should worry if I keep adding items.

    My bibliography is an important part of my work so I naturally worry about future-proofing my workflow.

    I have experienced performance issues when indexing is activated, so at some point I turned it off (but I have never tried turning it back on). That's why I'm asking about this option in particular.

    Thanks.
  • You should use Zotero normally and report a problem if you actually encounter one.
  • (But Zotero 7 is going to be faster because it runs natively on Silicone and yes, indexing does slow down some operations, but you should really be able to run a 20k library without issues even in 6).
  • Alright thanks. I just wanted to have some idea of what numbers to expect. Looking forward to Zotero 7!
  • Just to give another data point. I prob have ~ 100k items across 20 libraries that I load into the desktop. Recent higher-end Dell, running Ubuntu.

    The main issue is 'jitter' - let me explain. Searching or copying between libraries can be slow. When this is happening (or when there's other 'background' processes running), items don't stay selected and I cannot right-click. E.g., when searching starts, and items come up, and I select one, it will not stay selected.

    It would be great if the search was faster, but the main issue for me is the 'jitter' - i.e., having to wait for processes to fininish before I can use the app again.
  • @bjohas, are you on Zotero 6 or Zotero 7 beta?
  • Version 6

    Given the many improvements in 7, this may well be better there, but I haven't tested yet.
  • Okay, please do post here if you eventually give Zotero 7 beta a try.
Sign In or Register to comment.