Feature Suggestion: Delay updating citations in documents
So I've been digging through the code and thinking a lot on how we could improve the performance of word integration with huge documents -- when Zotero working well really matters. The biggest bottleneck now is that we must check every citation upon every insert for ibid updates and renumbering. There is no avoiding it if we want to keep the citations up to date. With big documents the processing time with every inserted citation grows more than linearly in citeproc by my measurements (not very thorough measurements, could just be linear, but other people's stories seem to support this).
We could, however, add a checkbox in the preferences to delay updating citations. Citation inserts would only insert the field codes and citation text, but wouldn't cause a full document crawl. We could highlight the bibliography and unprocessed citations red or something, to make it extra clear, that stuff isn't being updated. Updates would only be triggered upon pressing Refresh.
We could also prompt users to switch on delaying once an insert time reaches 10s+ or so. I can't think of any big disadvantages to this, aside from possible user confusion. But having them split documents into chapters is probably even less straightforward/more confusing anyway. And just thinking about the amount of time people waste by having to wait 5min every time they insert a citation is insane. This could literally save days of productive people's time combined every day.
We could, however, add a checkbox in the preferences to delay updating citations. Citation inserts would only insert the field codes and citation text, but wouldn't cause a full document crawl. We could highlight the bibliography and unprocessed citations red or something, to make it extra clear, that stuff isn't being updated. Updates would only be triggered upon pressing Refresh.
We could also prompt users to switch on delaying once an insert time reaches 10s+ or so. I can't think of any big disadvantages to this, aside from possible user confusion. But having them split documents into chapters is probably even less straightforward/more confusing anyway. And just thinking about the amount of time people waste by having to wait 5min every time they insert a citation is insane. This could literally save days of productive people's time combined every day.
If I recall correctly, @fbennett may actually have code in citeproc-js that facilitates this already. I remember us talking about this at some point.
When updating is disabled, it inserts placeholder cites {Author et al. 1998; Author2/Author2 2006} that are clearly not formatted. This is much faster. When the document is done, you enable updating and click Refresh to do all of the formatting at once.
For the Word plugin, the RibbonX interface allows toggle switches, so the "Disable Formatting Update" button could be readily visible to users.
Edit: There isn't much that can be done to speed up
citeproc-js
itself. I changed things a few months ago to avoid redundant use ofretrieveItem()
when processing an item, but the overhead of that was probably pretty small anyway. The burden of disambiguation does expand non-linearly, but that's unavoidable, since ambiguous partners need to be reevaluated in a many-to-many comparison to get their correct form in the new context. (The other performance bottleneck at refresh time would be the extraction of citation indices to confirm document state, but I assume that bit must be linear.)If I recall correctly, the main hurdle for static citations is that the word processor plugins will need a method for inserting a single field at cursor location.
Edit: never mind, @adomasven already proposes something like this in the original post ("Citation inserts would only insert the field codes and citation text").
Even with such a mode, I think that some sort of difference in formatting (e.g., curly brackets) should be there to prevent confusion by users.
And the most important thing is showing the user that the special mode is active. I don't think it really matters which citations have been added before or after activating such a special mode, since any new citation can make the existing citations out of date.
Has anyone profiled citeproc-js in order to find out where it actually spends its' time? Could memoizing function wrappers be applied? Could a compiler writer look it over and find ways to improve it?
If it's in the name disambiguation, are there CSL styles that do not peform that, so that during production the CSL style can be set to the {scannable cite} style or something?
Could it benefit from the similar "pipelining" that Q promises allow in the Javascript, but between processes... I wonder if the word processor can run a separate thread to talk with zotero integration.js, then fit it's edits in between ones being made by the typist, with the initial insertion being the {scannable cite} style, so you can see it then move on, and have the WP hop back and fix it up during pauses in typing? (Ever had an Emacs window on another X desktop, and have two people edit in the same file? If Emacs can do it...)
Any updates/timeline on this? I only have a couple dozen citations in my paper and I have to wait 10+ seconds for the insert citation dialog box to come up every time I want to add a citation.
Thank you very much !
One little thing is the wording of : "You will need to click Refresh in the Zotero plugin when you are done inserting citations." and "Automatic citation updates are disabled. To see the bibliography, click Refresh in the Zotero plugin"
I don't find the reference to the "Zotero plugin" clear enough (even if Zotero itself is not a plugin anymore) : it refers to the Refresh icon in the Zotero toolbar or tab, right? Couldn't we merely say: "[…] click Refresh in the Zotero tab or toolbar […]"?
(The relevant strings are, in zotero.properties,
integration.delayCitationUpdates.bibliography
andintegration.delayCitationUpdates.alert.text2
)[unrelated, but while I'm at it, "Choose File Handler" sounds very technical to me, and quite general. What about "Choose a PDF Viewer" (or reader or handler if it's common for English speakers – the string is
zotero.preferences.chooseFileHandler
in zotero.properties.]