Better BibTeX 5.2.x for large libraries

emilianoeheyns · January 10, 2020

I'll not make a habit out of doing advertisements here, but if you have a large library and experienced Zotero lockups during BBT (auto) exports, I'd love to hear about your experiences with BBT 5.2.x.

Exports now happen on a background thread ("worker"); they won't be any faster (in fact the workers currently don't use the cache so they are probably slower), but they shouldn't interfere with the Zotero UI anymore. This should in principle make auto exports feasible even for the packrats among us with 30k+ items (you know who you are). I've tested with 24k items set up for auto export and it's insane -- auto-exports spike one core for me for a minute and a half, but Zotero itself behaves as if it sat idle.

Pretty happy with this one. I'm pondering whether to add caching to the worker (which should make it faster at the cost of more memory use) or just scratch the caching system entirely (which would make BBT more robust -- as it is said, "there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors").

This does *not* work for drag-and-drop, so for those, the pre-5.2 behavior is active. Don't drag-and-drop 24k items as bibtex (maniacs).

dstillman · January 10, 2020

If there's no caching, does that mean that it could (to use your example) peg a core for 1.5 minutes every time an item is edited? Even if that was happening in that background, that would be highly undesirable.

emilianoeheyns · January 10, 2020

That's if you have on-change auto-exports for 24k items. I wouldn't actively recommend that setup. For 500 items I can't see the spike in in activity monitor and it takes 2.4 seconds; to actually a spike a core with that I need to start some 40 parallel exports.

emilianoeheyns · January 10, 2020

If I do that same 24k export normally, at the first export (cold cache) Zotero goes "not responding" and shows taking 106% CPU (which is what I take to be spiking 1 core). The 24k sample was supposed to show that an extreme case is now possible more than anything.

With a hot cache though that takes 15 seconds. Damn.

emilianoeheyns · January 11, 2020

Wait, I must be reading this wrong. https://postimg.cc/f31PrZjv/ddaed15d is with 4 simultaneous 24k exports running. How should I read the number for Zotero CPU usage vs the System/User/Idle numbers? I'm running this on a 2018 MacBook Pro (not the one with the escape key -- pretty salty about that one) with a 2.2 i7.

5.2.7 adds caching for workers, and the 24k export now takes 11-15 secs with a hot cache, running up to 35-40 secs if I start 4 of those in parallel. And heating up the cache is tolerable now for those numbers -- I used to heat the cache by exporting it first in small slices of the whole, so the UI would get its time and then repeated the full export from the hot cache, that is now no longer necessary, as a worker on a cold cache uses about the same (AFAICT) as foreground export, and now it only needs to be done once, while Zotero remains usable.

dstillman · January 11, 2020

How should I read the number for Zotero CPU usage vs the System/User/Idle numbers? I'm running this on a 2018 MacBook Pro […] with a 2.2 i7

100% / (6 cores x 2 for hyper-threading) = 8.33% ≈ 8.79%

emilianoeheyns · January 11, 2020

Ok so my interpretation was sort of right, but then I had expected the load to be much higher. Zotero exports are CPU-bound (right?) and synchronous so I would expect them to each hog a core until done. But this screenshot was with 4 running in parallel, so I had expected more cores to be busy. Do you happen to know how I can tell which process uses which core/hyperthread/whatever useful concept applies here in MacOS?

emilianoeheyns · January 11, 2020

The workers now both read from the cache and contribute back to it.