Zotero 5.0 beta: Fail on large import, bibtex format

alanterra · January 16, 2017

I am trying to import thousands of items into Zotero 5.0 (5.0-beta.111+aa78387) and am running into a consistent problem. I have turned off zotfile, and the problem persists.

I am importing from a bibtex UTF8 file. If the file is larger than a certain size (somewhere around > 1,000 items or > 20,000 lines) I get the message

[JavaScript Application]
An error occurred while trying to import the selected file. Please ensure that the file is valid and try again.

At least once, after getting this error I could not import any files without quitting Zotero and relaunching it.

I can solve this error by breaking the file into blocks of about 15,000 lines (of course being careful to break between bibtex entries).

Two issues--first this seems to be a bug. Second, if the error message gave some context, it would make it easier to figure out what to do to solve it.

adamsmith · January 16, 2017

If this works for the entire library when cut into pieces, I'd guess an out of memory error, though 1000 items seems quite small for that. (You can look at the error yourself when you use the report error function, FWIW)

adamsmith · January 16, 2017

also, you should actually report the error via an error ID. Otherwise we don't know anything beyond javascript error either: https://www.zotero.org/support/reporting_problems

alanterra · January 16, 2017

OK

alanterra · January 16, 2017

This is an error report that appears to be the same error as I reported a little earlier today.

If you try to import too many items from a Bibtex UTF8 file, you get a JavaScript error, which raises this error (pasted below). When this error has happened before, I have found that if you break your Bibtex file into blocks of about 15,000 lines, or < 1,000 items, you can import the file, even though you can't import the entire file. I'll update this post if that fix doesn't work in this case.

It has been suggested that this might be an out-of-memory error. I didn't monitor memory pressure while the import was running, but now that it has terminated I see no large memory usage by zotero, or my system. I have 32 GB of memory, and I am using about 13 GB.

I can supply the .bibtex file if desired.

Zotero 5.0-beta.111+aa78387. MacOS Sierra 10.12.2

From the error log:

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding unknown JSON field 'backupPublisher' for item 1/null"]

[JavaScript Error: "Discarding invalid field 'series' for type 7 for item 1/null"]

[JavaScript Error: "Discarding unknown JSON field 'backupPublisher' for item 1/null"]

[JavaScript Error: "Discarding invalid field 'series' for type 7 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Discarding invalid field 'publisher' for type 4 for item 1/null"]

[JavaScript Error: "Item 13488 not loaded" {file: "chrome://zotero/content/xpcom/data/items.js" line: 525}]

[JavaScript Error: "Item 13488 not loaded" {file: "chrome://zotero/content/xpcom/data/items.js" line: 525}]

[JavaScript Error: "Item 13488 not loaded" {file: "chrome://zotero/content/xpcom/data/items.js" line: 525}]

[JavaScript Error: "Item 13488 not loaded" {file: "chrome://zotero/content/xpcom/data/items.js" line: 525}]

alanterra · January 16, 2017

After a failed import, subsequent imports will often fail until Zotero is quit and relaunched.

dstillman · January 17, 2017

(You only need to post the Report ID here, not the report content itself. We'll excerpt any relevant errors, and the Report ID gives us additional info.)

Particularly with 32 GB of RAM, it's much more likely that this is a problem with a specific item — or combination of items — than that it's an issue with importing a large file.

If you don't mind sending the BibTeX file to support@zotero.org with a link to this thread, that's probably the easiest. If you prefer, a Debug ID (different from a Report ID) for an import attempt that fails might be enough.

alanterra · January 17, 2017

I mailed it to support. BTW, I am pretty sure that there is no single item that is causing the problem--if I break the file up into "bite-sized" chunks, I can import all the items, but I cannot if they are in one big file.

alex.mitrani · September 8, 2017

This is helpful. I'm experiencing a very similar problem and have also resorted to breaking the BibTex file into smaller chunks, which seems to work. I was trying to import close to 9000 items and it crashed twice, each time after getting through 3600 items or so.

adamsmith · September 8, 2017

it crashed, i.e. shut down, or it threw an error? If the former, that actually does suggest a memory issue.

alex.mitrani · September 9, 2017

Thanks for your reply. It stopped importing documents but seemed to continue doing something, judging from what I saw when I used "view log" on the debug menu. It didn't shut down, the interface stayed responsive. I found an error report using the debug menu and submitted it, the related post is "Error Report ID 1791536310".

dstillman · September 9, 2017

out of memory

Yeah, it's running out of memory. We'll look into why this is happening. For now, splitting up the import file is your best bet.

dstillman · September 9, 2017

@alex.mitrani: Have you used the BBT extension? If so, reset translators from the Advanced → Files and Folders pane of the Zotero preferences and try again. Import translators in 5.0 were rewritten and should use less memory, but it's likely the BBT translator hasn't been. (It doesn't look like you have that installed now, but if you did before the translator could still be around.)

alex.mitrani · September 11, 2017

@dstillman: thanks for your comments. No, I haven't used the BBT extension.

emilianoeheyns · September 11, 2017

I haven't tested it but my bet would be that the BBT translators use significant amounts of memory. They're not optimized for speed or size. The import translator parses everything into memory for example before starting with the reference creation. It would really surprise me if the stock BibTeX translators run out of memory but mine wouldn't.

@dstillman Is there something specific I should target for rewrite? Should I split off the import translator so it doesn't carry the weight of the export translator?

@alex.mitrani could I get a copy of that file for testing? It should be possible to attach it to an issue at https://github.com/retorquere/zotero-better-bibtex/issues

adamsmith · September 11, 2017

@emilianoheyns Dan asyncified most import translators, see this commit:
https://github.com/zotero/translators/pull/1354/

emilianoeheyns · September 11, 2017

interesting. If I look at the endnote XML importer it looks like it's still pulling the source into memory whole first (not much else can be done with XML), but then imports the references one-by-one async, correct? I should in principle be able to do this for my importers too. Does this have memory benefits? I thought async was mostly an issue of cooperative multitasking?

dstillman · September 12, 2017

Explained here: https://github.com/zotero/translators/issues/1353

Without this, the items have to be processed and queued synchronously (which can hang the UI) and then saved to disk together in a single (potentially huge, memory-intensive) transaction.

dstillman · September 12, 2017

@alex.mitrani wasn't using BBT, so you shouldn't need to debug anything for this problem (beyond asyncifying the BBT import translators in general). I'm going to investigate memory usage with the built-in asyncified translators and see if I can reproduce this. Obviously, if @alex.mitrani wants to share the file publicly, we can test using that, but it's probably not necessary.

emilianoeheyns · September 12, 2017

@dstillman just always on the lookout for live test data to add to my test suite.

alex.mitrani · September 12, 2017

@emilianoeheyns: I have submitted the .bib file on GitHub, as requested. The issue has the same title as this discussion. Hope it helps.

emilianoeheyns · November 29, 2017

@dstillman:

* Does the async: true only relate to the import? If a translator does both import and export, export is still sync right?

* When configOptions.async is set to true, item.complete() always returns a promise, correct?

* Promise.coroutine does not seem to be available in the translators. Can it be made available?

* If Promise.coroutine would be available, would it be largely sufficient to yield on item.complete() instead of just calling it in doImport to make a translator async?

* How do async translators import collection info? The collection info is tied to the references to be imported using itemIDs that are only meaningful for the duration of the import session. Will this mechanism stay unchanged, even when the items will now not be saved in bulk at the end of the session?

dstillman · November 29, 2017

Sorry, I don't remember the details here, and some of this is in flux.

Promise.coroutine does not seem to be available in the translators. Can it be made available?

No, but you might be able to use async/await.

emilianoeheyns · November 29, 2017

I had forgotten about async/await. Which is odd, because I use it pretty much exclusively over coroutine.

In flux meaning "don't bother for a while"? I'm fine with that.

dstillman · November 30, 2017

I'm flux meaning "don't bother for a while"? I'm fine with that.

Yeah, not sure when I'll get to it, but I'm planning to update some of the existing import translators to use async/await, so those should be able to serve as examples (and possibly prompt additional modernization of the translator architecture).

emilianoeheyns · July 20, 2018

I have it working, and the large library imports now (if slowly).