[Zotero 7 Beta] High CPU usage

Since the last update, I have noticed high CPU usage (around 15%). It occurs without any specific action being taken. I have the following plugins installed: Better BibTex; Translate for Zotero; Zotero Attanger.
I'm running Ubuntu 24.04 and Zotero 7.0.0-beta.91+3c6625f3c (64-bit).
Debug ID: D272073557.
Thank you for your impressive work.
  • You may disable all plugins or restart in troubleshooting mode and see if the problem persists; if not, enable plugins one by one to locate the source.

    For issues with plugins, please report to their owners.
  • But from the debug output this is pretty clearly BBT — there are lines like "+45 Better BibLaTeX inserting" over and over.
  • Thanks. I'll open an issue on the GitHub repository of BBT then.
  • Auto-exports were running, the 'inserting' statements are the cache being filled (turn off the cache, and they will stop). The cache is being overhauled and these will go away entirely, but auto-exports will always incur some load of course.
  • Thank you for your feedback.
  • @emilianoeheyns: We get reports of this pretty regularly. Can there be some indicator when an auto-export is running? A spinner or something, that identifies BBT if you hover over it? Or at least if the process is taking more than a second or so?

    Is the idea that this was slow because it was the first export being generated (or maybe there were many new/modified items, or the citekey had changed, or…)? How long does a regular auto-export take for a large library, once entries have been cached?
  • Can there be some indicator when an auto-export is running? A spinner
    or something, that identifies BBT if you hover over it?

    Sure, that shouldn’t be hard. Is there something in the UI
    (preferably available in Zotero 6 and 7) that I can reuse for this
    purpose?

    Is the idea that this was slow because it was the first export being
    generated (or maybe there were many new/modified items, or the citekey
    had changed, or…)?

    That’s part of it; part is also that the serialized items in the
    current release always need to be transported to the worker for each
    export, which takes a surprising amount of time. This I have solved
    already on a development build; the cached entries also need to be
    similarly transported to and from the worker, I’m working removing main
    thread interaction there entirely. This will improve both performance
    and memory use.

    How long does a regular auto-export take for a large library, once
    entries have been cached?

    I’ll be running new performance tests when I have the cache overhaul
    done, the last measured state of affairs is at
    https://retorque.re/zotero-better-bibtex/support/performance/index.html.
    I have a 24k items test in my test suite that's currently broken, I’ll revive that for the
    tests.

  • Is there something in the UI
    (preferably available in Zotero 6 and 7) that I can reuse for this
    purpose?
    There's the bottom-right popup, but there's nothing built-in that's more subtle than that.
    the last measured state of affairs
    OK, so that was still taking 3–4 seconds for every change (maybe debounced)? Maybe your planned optimizations are sufficient, but if not, it seems like one option you could explore would be caching the start/end positions of each entry and, assuming the mtime/size hadn't changed, just copying those across to the new file with readBytesInto().
  • The latter would require entries to take up a fixed or maximum size, no? Otherwise what would happen if for example the title becomes longer? There wouldn't be enough space to write the new entry into.
  • Ah and yes, auto-exports are debounced.
  • edited June 26, 2024
    The latter would require entries to take up a fixed or maximum size, no?
    No, you'd still write to a new file, so the new entry could be any size. But if, say, one entry in the middle of the file changed, you would read from the beginning of the old file to the end position of the entry before the new one, generate the new entry, and then read from the end position of the previous version of the entry to the end of the file, and write those three chunks to a new file. For multiple updates, you'd read smaller sections. But you wouldn't need your cache at all as long as the file hadn't changed.

    Depending on where you're planning to keep your cache in your new system and how long it takes to shuttle data around, just doing a couple straight reads of the existing file might be faster.
  • First implementation is done, and it's looking pretty good. I exported the 24k items library, and it looks to be:

    native: 5.7s, raises load from idle to ~96% (on the foreground thread)
    bbt, uncached: 43s, raises load from idle to ~112% (in a worker thread)
    bbt, cached: 5.6s, raises load from idle to ~101% (in a worker thread)

    this is just from a few runs, not scientific tests, but this looks promising to me, and if you're exporting 24k items on the regular, you can't really complain about these numbers.

    A more sensible 86-item export takes 0.3s uncached, 0.06s cached.

    Still needs more testing, and I'm pretty sure I can shave time off the cached export.
  • Oh and in Z7 the times are 2.9s/15.8s/3s, respectively.
  • edited July 8, 2024
    That's using your planned design, not my suggested optimization above?

    Do people tend to turn this on for entire libraries or for specific collections? Trying to understand whether 86 items or 24K is more representative, or if something like the size of a standard library — somewhere between those numbers — would be more representative.

    My concern is entirely about auto-exports creating load (with battery usage, heat, etc.) on every item change and then blaming Zotero for being slow.
  • edited August 2, 2024
    This is mostly the planned design; your suggestion inspired some simplifications that made the cache less complex, leaving less work for the main thread. I'll give the file-reading a go when I come back from holidays in a few weeks, but the boost would have to be substantial; the approach with a split administration of output and indexes seems a bit fragile to me. Never mind the fact that I have too many users that manually edit auto-exported files despite my recommendation to use postscripts.

    I don't have telemetry to see how many/how large auto-exports users have set up, but the 24k library is way out of the ordinary, and for me is only useful as a stress-test. The 24k uncached library export went from 400 seconds to 43 seconds BTW, mostly because the source items no longer need to be transferred to the worker for every export. The new design will also use substantially less memory for large libraries.

    I'm open to warning users about large auto-exports, even persistent warnings, or curtailing auto-export behavior; I could see enforcing on-idle exports after a certain threshold on number of items or export time, or dynamically pushing back the default debounce delay to something like double the longest export to date. But with the work happening on a worker thread, with the main thread now only communicating what item IDs to export, Zotero wouldn't really slow down, although of course the battery usage/heat concern stands.

    I don't see the point of auto-exporting a 24k item bib file, really, but there is at least one user that does; it is how I got the sample library. But his is extreme, and I had to make changes to my test infra to even be able to load the database (before starting the actual tests).

  • BTW loading the cache for the 24k items takes 90 milliseconds.
  • I've got the cached 24k export down to 1s, with stock Zotero BibTeX taking 3s (which is still pretty impressive I think).

    I'm still looking at the CSL JSON export -- that takes 2.5s (against 3s stock), but I had expected it to come out roughly the same as the TeX exports on a warm cache -- the cache handling is uniform over them.

    Zotero 7 is much faster than 6, there it works out to 1.8s vs 6s for TeX, and 3.9s vs 6.3s for CSL.
  • edited August 3, 2024
    I was including the overhead of my test framework. The actual numbers for the 24k library, which is rare but not unheard of, see eg https://www.zotero.org/groups/340666/ccnlab/library, are below. The forum adds a big chunk of whitespace here, I don't know how to remove it, scroll down to see the numbers. Almost all work has been shifted off the main thread. Times in milliseconds:
    SystemMBPro M2 Proi5-3470
    Zotero7676
    Stock BibTeX27617053849718566
    Better BibTeX, cold11456484253867575516
    Better BibTeX, warm606180920623406
    Stock CSL26925837739515948
    Better CSL, cold6738321202154636373
    Better CSL, warm784150820973469
Sign In or Register to comment.