Window becomes unresponsive after large import is at 100%

thinkround · August 12, 2020

I have collected some debug data on a persistent problem I am facing: When importing a large batch of 2000 references, it will take a while to finish (which is expected) but when the progress bar reaches 100% the window will freeze and Zotero will become totally unresponsive. This means that it is not possible anymore to put any Zotero window into focus, instead the mouse pointer spins rainbow (macOS). I have found that if I Force quit Zotero and restart, I can continue my work, the import seems to be done 100% and everything works as far as I can tell. So I am not really sure what Zotero is doing during the time that the window is frozen after the import is done.

I tried to collect debug data through the built-in debug logging functionality of Zotero, and let the unresponsive window do its thing over night. The next morning it was finished and I could stop logging, but I made the mistake of clicking "View output" out of curiosity, which froze my Zotero again, except this time I didn't have the patience to let it run for a few hours again and killed the unresponsive Zotero process before submitting the data...

So I instead started Zotero with the command `/Applications/Zotero.app/Contents/MacOS/zotero -ZoteroDebugText > Zotero/debug` and did the import of 2000 articles again in a new empty group library. Just before reaching 100% I did `cat debug >> debug_ && echo "" > debug` so only the last few lines of the debug output were kept in the file debug. When the window froze I observed `cat debug` but nothing was happening there. I left it running again overnight and the next morning I ended up with a debug file the size of about 180MB which did not respond well to cat and less and nano, so I truncated lines to a maximum of 2000 characters by piping through `cut -c 2000`.

Linked are two files: the truncated version (lines are cut at 2000 characters) and the full debug output of the first 10 lines after the unresponsive episode, which starts off with a huge binary blob.
https://lindenlion.net/zoterodebug/debug_truncated_to_2000_characters_per_line
https://lindenlion.net/zoterodebug/debug_10_large_lines.zip (more than 100MB unzipped)

For completeness here is also a Debug ID just for metadata about my system, with no significant action performed: D1260655804

I hope that this is helpful in finding out what is happening when import reaches 100% and the window becomes unresponsive.

dstillman · August 13, 2020

The binary blob is just many megabytes of null bytes — I'm not sure how you ended up with that, but I don't think it's from Zotero. Certainly nothing we've ever seen with normal debug logging.

The first actual line shows a 20-minute delay, which obviously isn't ideal but is fairly normal for certain large imports. Don't force-quit it — it's doing things, even if frozen, and you should let it finish.

What file format were you importing? The freeze might only happen with certain formats that haven't yet been updated to use a newer approach that allows the UI to remain responsive.

In any case:

Imported 2231 item(s) in 2691477.6041789996 ms

The whole import appears to have taken 44 minutes. Again, far from ideal, and it's something we hope to improve, but hopefully not something you're doing regularly.

thinkround · August 13, 2020

I suspect the binary blob has something to do with running `echo "" > debug` while Zotero is using the file for stdout. At least, that's how I can recreate those bytes during any Zotero activity. Clearly not the ideal way for logging debug activity...

Just now I did another import of ~1500 articles which I exported as Zotero RDF from another group library. It took 23 minutes. Please note though that my laptop is not the fastest (mid-2010 MacBook with 2.4 GHz Intel Core 2 Duo) so most users should probably see something faster, at least on a Mac.

For the previously reported import I used a personal fork of the BibTex translator, which I fine-tuned for more accurate data mapping from Scopus exports, doing some guesswork about the order of keywords (EMTREE and MeSH)... I'm pretty sure, that the unresponsive window happened with the original bibtex translator as well.

Cool to see it only took 44 minutes in total, I only saw that it had finished the next morning. That would mean, that the window was frozen for about 20 minutes after reaching 100%, so really 100% should have been ~50%, assuming there is still something useful going on in the background.

So what did that first line after a 20 minute delay mean? I figured now it says `Committing notifier event queues [add-item: 8689] [modify-item: 2231] [add-collection-item: 2231] [add-item-tag: 110078] [queues: m3LkmOpX]`
Is it actually this thing that is taking so long or is something else happening before, that is not logged?

When you say certain formats haven't yet been been updated, do you mean the translator file? Does this apply to the current bibtex translator? Or is it something more low-level? Can you give me an example of a format that uses the newer approach?

I fear I'll be doing a few more imports like this in the near future, for a systematic literature review, but I appreciate your input, at least I know now that the unresponsive window is not something totally unexpected.

thinkround · August 26, 2020

I'm trying to learn more about how zotero translators work. Can anyone point me to an example of a translator that uses the newer approach that allows the UI to be responsive? And is it correct that the bibtex import translator uses the older approach?

emilianoeheyns · August 26, 2020

It's the translators, yes, and the ones that implement this are BibTeX, CSL JSON, Endnote XML, RDF, and RIS.

May I know what you changed about the bibtex import? I might want to add some of those changes to my own bibtex importer.

emilianoeheyns · August 26, 2020

These all declare "async: true" in their header, they return a promise from doImport, and they await the result of item.complete. This has the result that they allow the UI to be active (ao things) between items being imported.

thinkround · August 26, 2020

What I did is highly specific to Scopus Bibtex exports and cannot be applied to bibtex in general: I added some mapping to extract MeSH and EMTREE keywords based on the observation, that anything after the occurence of a capital letter following a comma is MeSH, while terms before that are EMTREE.
I also added some logic to distinguish books from book sections in Scopus bibtex exports.

Have a look on my github fork if you are interested: https://github.com/michi-zuri/translators/blob/scopusbib/Scopus BibTeX.js

thinkround · August 26, 2020

I'll make sure to push any unsynced changes when I get home, it could be that I haven't yet...

thinkround · August 26, 2020

So if bibtex actually does the newer approach, what could explain the behaviour I am experiencing, that is: progress bar reaches 100% after 20 minutes, import finishes after 40 minutes. Zotero is completely unresponsive (frozen it seems, cannot even minimize or move the window) for the last 20 minutes from the moment 100% is reached.

emilianoeheyns · August 26, 2020

The standard bibtex importer from which you split off should save each item as it finds it, so I'd say that a large chunk of work during which Zotero remains responsive, followed by a large chunk where it is not seems strange to me.

thinkround · August 26, 2020

ok, I just synced my latest changes to github. Yeah, it's very strange indeed... I could try to capture another debug report and hope it is of any help. The thing is though, during the unresponsive period, Zotero doesn't log anything either. Could it have something to do with garbage collection (just a random guess)?

emilianoeheyns · August 26, 2020

I've not seen it before myself. What are you importing? I could see if I can replicate it.

thinkround · August 26, 2020

This is the file I was importing: https://gist.githubusercontent.com/michi-zuri/9790fbbf23c785ed335512de528d7177/raw/936454d9e021ae4663bb5c7087f569cda6df29f7/scopus.bib

After the freezing, everything is fine, and the import has always been succesful so far, even if I killed Zotero while being unresponsive (lost patience after 10 minutes...)

emilianoeheyns · August 27, 2020

I see the same behavior -- not as long, but this is a reasonably fast system I'm running it on. It might be that the items are saved in a transaction, and that the actual commit of the transaction is the stall. Which would still be weird I think? The transaction ought to happen in the background, with the foreground thread being notified when it's done.

thinkround · August 27, 2020

Thanks for checking. I've got some major exams coming up next week, so won't be reading here in the next two weeks.