Feature Suggestion: Delay updating citations in documents

So I've been digging through the code and thinking a lot on how we could improve the performance of word integration with huge documents -- when Zotero working well really matters. The biggest bottleneck now is that we must check every citation upon every insert for ibid updates and renumbering. There is no avoiding it if we want to keep the citations up to date. With big documents the processing time with every inserted citation grows more than linearly in citeproc by my measurements (not very thorough measurements, could just be linear, but other people's stories seem to support this).

We could, however, add a checkbox in the preferences to delay updating citations. Citation inserts would only insert the field codes and citation text, but wouldn't cause a full document crawl. We could highlight the bibliography and unprocessed citations red or something, to make it extra clear, that stuff isn't being updated. Updates would only be triggered upon pressing Refresh.

We could also prompt users to switch on delaying once an insert time reaches 10s+ or so. I can't think of any big disadvantages to this, aside from possible user confusion. But having them split documents into chapters is probably even less straightforward/more confusing anyway. And just thinking about the amount of time people waste by having to wait 5min every time they insert a citation is insane. This could literally save days of productive people's time combined every day.
  • I think this is exactly the way to go. This has been suggested by a couple of users (e.g. https://forums.zotero.org/discussion/35406/can-i-stop-zotero-from-automatically-updating-citations ) and I think some competitors allow some version of this. I think Endnote, e.g., has a mode where it just inserts placeholders of citations.

    If I recall correctly, @fbennett may actually have code in citeproc-js that facilitates this already. I remember us talking about this at some point.
  • edited March 23, 2017
    Yes, Endnote has a great system for this. It has a toggle to enable or disable document updating.

    When updating is disabled, it inserts placeholder cites {Author et al. 1998; Author2/Author2 2006} that are clearly not formatted. This is much faster. When the document is done, you enable updating and click Refresh to do all of the formatting at once.

    For the Word plugin, the RibbonX interface allows toggle switches, so the "Disable Formatting Update" button could be readily visible to users.
  • edited March 23, 2017
    When the document is done, you enable updating and click Refresh to do all of the formatting at once.
    Shouldn't Refresh always do a proper update, regardless of the mode?
  • It would also be nice to keep this out of the initial doc prefs window and only show it when you open the doc prefs in an existing document, since it's unlikely to be relevant for new documents, and possibly/hopefully for most people. And then we could also prompt to switch (once per document, with a persistent flag) when an update took a long time. I think we'd want to do that after a shorter time, though — maybe 5 seconds (particularly if we weren't showing the checkbox in the initial window).
  • +1 for this feature
  • edited March 23, 2017
    A couple of the earlier threads where this was raised are here and here. It would be a great feature to add.

    Edit: There isn't much that can be done to speed up citeproc-js itself. I changed things a few months ago to avoid redundant use of retrieveItem() when processing an item, but the overhead of that was probably pretty small anyway. The burden of disambiguation does expand non-linearly, but that's unavoidable, since ambiguous partners need to be reevaluated in a many-to-many comparison to get their correct form in the new context. (The other performance bottleneck at refresh time would be the extraction of citation indices to confirm document state, but I assume that bit must be linear.)

    If I recall correctly, the main hurdle for static citations is that the word processor plugins will need a method for inserting a single field at cursor location.
  • Can you elaborate on that last point?
  • edited March 24, 2017
    It take it back. The plugins are above my skill-level, and I shouldn't have commented on them. (Quite possibly I was wrong, but I should generally keep my nose out of that end of things. :)
  • edited March 24, 2017
    The biggest bottleneck now is that we must check every citation upon every insert for ibid updates and renumbering.
    Apart from using placeholders, how about a semi-functional mode that still uses normal Zotero fields but eliminates anything that slows down citeproc-js? E.g. disable disambiguation, disable ibid updates, and don't renumber (not sure how the latter would work; one option is to just append any newly cited items to the end of the bibliography and stop sorting the bibliography).

    Edit: never mind, @adomasven already proposes something like this in the original post ("Citation inserts would only insert the field codes and citation text").
  • @Rintze also first versus subsequent appearances.

    Even with such a mode, I think that some sort of difference in formatting (e.g., curly brackets) should be there to prevent confusion by users.
  • @bwiernik, with "disable ibid updates", I meant treating each cite as having position="first", so that would take care of subsequent appearances too.

    And the most important thing is showing the user that the special mode is active. I don't think it really matters which citations have been added before or after activating such a special mode, since any new citation can make the existing citations out of date.
  • I think we are agreeing on all counts--I just want to be sure that there is some very readily available indication that formatting updates are disabled.
  • @fbennett ? When I insert a new citation in a large document in my zotero-texmacs-integration, it does take a long time. In a smaller document when I use the "classic" insert citation dialog, then check the box for multiple citations, I notice that the more of them I select, the longer it takes. The same thing for the "editBibliography" entry. The larger the bibliography, the longer it takes, and I am sure that it's O(n^2), without checking, just from the time it takes when I add more of them...

    Has anyone profiled citeproc-js in order to find out where it actually spends its' time? Could memoizing function wrappers be applied? Could a compiler writer look it over and find ways to improve it?

    If it's in the name disambiguation, are there CSL styles that do not peform that, so that during production the CSL style can be set to the {scannable cite} style or something?

    Could it benefit from the similar "pipelining" that Q promises allow in the Javascript, but between processes... I wonder if the word processor can run a separate thread to talk with zotero integration.js, then fit it's edits in between ones being made by the typist, with the initial insertion being the {scannable cite} style, so you can see it then move on, and have the WP hop back and fix it up during pauses in typing? (Ever had an Emacs window on another X desktop, and have two people edit in the same file? If Emacs can do it...)
  • +1
    Any updates/timeline on this? I only have a couple dozen citations in my paper and I have to wait 10+ seconds for the insert citation dialog box to come up every time I want to add a citation.
  • It's actually just been accepted and should be available the beta within the next day or so (if it isn't already). I assume Zotero will want this beta tested pretty thoroughly, though may still be a bit (guessing ~1 month) before it lands in the regular release
Sign In or Register to comment.