Feature suggestion: ”Quick refresh”

Short summary: A proposal for a new button that refreshes only new citations and citations that have been changed since the last refresh (analogous to an “incremental” backup instead of a “full” backup, but for citations and bibliography).

I want to preface my suggestion by saying that Zotero is an absolutely brilliant program. I use it virtually every day, and couldn’t imagine life without it. The Microsoft Word integration is a life-saver in academic writing. The one thing that remains a challenge for me is the processing time – the time to edit the first citation of the day, but mainly the time to perform a full refresh. For us obstinate people who like to keep everything in one document and who have disabled automatic citations updates, refreshing a thesis or dissertation can in some cases take hours. This appears to be because the program seems to process all the citations, even those that haven’t been altered since the last refresh (that is at least what it looks like in the Zotero log output).

This reminded me of how backup programs typically have options to perform “incremental” or “differential” backups, which only back up files that have been changed since the last backup, saving valuable time and processing power. This made me think that it would be helpful to have an in-between option in Zotero, between “Automatically update citations” and the daunting full “Refresh”. Perhaps it would be possible to have a button which only refreshes new and recently changed citations (i.e., those with dashed underlines), and re-generates the bibliography? It could be called something like “Quick refresh”. I have searched the forums but haven’t found this suggestion, so I thought I’d humbly submit it myself. I’m imagining something like the function of Word’s “Save” button, which only initiates the saving process if something has actually been changed in the document. If no changes have been made, the “Save” button does nothing (and is even greyed out in some programs).

Not being that knowledgeable in programming, I wouldn’t know whether this idea is feasible, advisable or even possible. My (entirely inexpert and probably wrongheaded) idea is that each “Full refresh” could perhaps create some kind of cache file containing the information needed to quickly generate a bibliography for the document in question. Each consecutive change made by Zotero to the document would then be written to this cache file. When “Quick refresh” is invoked, most of the processing has already been done and saved in the cache file, so Zotero would only have to consolidate the changes, update the relevant citations and re-generate the bibliography to reflect these changes. (Perhaps the program could even do some of this consolidation or preparation in the background, when the system is idling?) Clicking the “Quick refresh” button immediately after a refresh would do nothing, since there is nothing to update.

Were it possible to implement such a “Quick refresh” function, I think it would make it less daunting to perform the habitual refresh, e.g., at the end of the work day. I imagine this would make it easier to keep the document clean and up-to-date, without the hassle of having to initiate a full refresh. Even if the results won’t always be 100 % accurate (e.g., I imagine a “Quick refresh” wouldn’t reflect changes made to references centrally in “My Library”), it would at least give users a preliminary bibliography, page count, up-to-date citations, etc. This would be sufficient for day-to-day purposes. A “Full refresh” would of course be performed when finalising the document, to ensure that everything is correct and up-to-date.

I made a rough mock-up to give an idea of what I mean: https://i.imgur.com/nsALSwt.png
If this idea is unfeasible or impractical, feel free to ignore it. I’m just a user looking forward to future improvements in this already excellent program.

Best regards,
G.P.
  • Unfortunately that's just not how citation styles work. When you add or remove citations, Zotero has to scan all citations in the document to update any citations and the bibliography, because style rules often make citations and references dependent on other citations and references — think ibid., given name disambiguation, sorting, etc.

    You don't say what platform you're on, but if this is macOS, Mac Word integration is unfortunately just slow at the moment. We have some planned changes that will hopefully speed things up in the not-too-distant future.
  • That's a shame. Thank you and I'm looking forward for future changes that may improve the situation.

    Best regards,
    G.P.
  • Hello again dear Zotero community,

    I have played around with a test document (an old master's dissertation of roughly 90 pages with around 250 footnotes) in Microsoft Word (Windows/Office 2019) and Zotero 6.0.26, to assess the viability of my idea of a cache-based “Quick refresh” function. It was mostly an intellectual exercise for my own edification to day-dream a little about how the processing time could be reduced, so it can safely be ignored if the results are obvious or obviously mistaken.

    Zotero’s logs tell me that the processing time required to run a full refresh (without having added or changed any citations) remains the same every time (around 130 seconds in my test document). Comparing the output from Zotero’s logs in WinMerge (after removing the timestamps) gives the result: “The selected files are identical”. I take this to mean that repeatedly invoking “Refresh” makes Zotero perform exactly the same processing over and over again. This is basically the part of the processing I was thinking could be saved in a cache file.

    Adding new citations increases the total processing time by some 5–30 seconds. I assumed that this is the time it takes for Zotero to scan the document and compare the new citations against all the previous citations in the document, but maybe that was where I went wrong? It is only when I add more than 40 new citations per trial that the refresh time starts to increase rapidly (exponentially?). My results can be found in this chart: https://i.imgur.com/69eKfdX.png Blue columns represent the “baseline” time to refresh when no changes have been made since the last refresh. Red columns represent the added time (over and above the baseline) to add n new citations, from no changes (+0) to 200 new citations (+200). My results seem to suggest that, unless the user has added hundreds of new citations since the last refresh, the bulk of the processing time during a refresh seems to consistently be devoted to repeating the same processing as in the last refresh, although I may have interpreted the logs incorrectly. The first half of the logs are consistently near-duplicates, indicating that the same processing seems to be performed each time. My hypothesis is that the increase in time reflects the processing time required to specifically consolidate the new and changed citations (this processing seems to be concentrated to the second half of the Zotero logs, after the line “Integration: style.updateUncitedItems”).

    If this is the case, perhaps it should be possible to use a cache file to replace at least the “ZoteroWinWordIntegration: getDocumentData”-phase of the refresh process? My idea of a cache-based “Quick refresh” function is that Zotero would save the results from the last refresh in some kind of cache file (I’m sure there is a better technical term or solution for this). When “Quick refresh” is invoked, the cache file is fetched to get the old citations and the bibliography. If Zotero has no record of any changes since the last refresh, the bibliography would be re-generated from the cache file. If there are records of new or changed citations, these are consolidated as usual (shouldn’t this cover cases like “Smith 2001a; Smith 2001b; “Ibid.”, “Op. cit.” etc.?). The cache file would then be updated and the bibliography would be generated. This way, at least some of the processing time would be avoided, and users would get the option to consolidate or integrate new citations without having to go through the hassle of performing a full refresh every time. Each new “Quick refresh” would fetch and update the cache file, until a “Full refresh” is performed and the cache is flushed and created anew. The results of a “Quick refresh” might not always be 100 % correct, but achieving perfection is what “Full refresh” would be for! The key question is whether it would be possible to use cache files in this way, at least to get the old citations. That, I couldn't say, and I have probably already wasted too much time thinking about this problem. It’s probably time to go outside instead!

    Best regards,
    G.P.
  • I'm sorry but I really don't see this happening -- I also just don't see much of a demand for "Update citations so they're still not quite correct but a little less incorrect than with automated update disabled".
    I understand it's something that you want, but it's not actually something many other people have been asking for (as opposed to the ability to turn off automatic updates, which had been very frequently requested) and you're suggesting to develop a huge set of completely new considerations that Zotero would have to develop, maintain, and troubleshoot (including storing and managing caches for every document you've every Word document that uses Zotero). That in addition to the possible confusion caused by citations now existing in three different states.
  • Another option for avoiding slow processing with some word processors/platforms/big documents is using unformatted citations as you write. That is, plain text citations like {Smith, 2014}, which you type in directly (or via Quick Copy). You only format the citations in your desired style at the end, during final document prep, using Zotero's RTF Scan (or the RTF/ODF-Scan addon).
    https://www.zotero.org/support/rtf_scan

    Using unformatted citations involves a few more steps and caveats in Zotero than it does in Endnote (where unformatted citations were what I mostly used; it was about the only thing I liked about Endnote). But they might be the difference between workable and unworkable in some cases. And you could use the normal inserted-formatted citation process for other cases. So it's relatively easy to give it a try.
  • It's hard to comment your investigation, because there are many factors to the equation of refresh speed. So to give some context, I have spent probably hundreds of hours optimizing Zotero integration speed, specifically because the Word for Mac integration is very slow when it comes to communication between Zotero and Word.

    Generally the refresh operation consists of 3 phases:
    1. Read the document for Zotero fields. This also gets "active links" to fields for Zotero to use, so Zotero can update citations. Without these links, Zotero cannot edit citations in the document.
    2. Process citations with a citation processor based on the user chosen citation style and generate the text of each citation and bibliography.
    3. Write changes to the document.

    (2) used to have some slowness some years ago, but we got that ironed out, and it's generally fast, and scales mostly linearly with the size of the document. There may be some edge-cases that we have not discovered, that scale exponentially, but we'd definitely be interested to investigate. In the debug log you'll find these logged as style.processCitationCluster lines.

    (1) and (3) we have limited control over. Since we cannot avoid those steps and cannot speed them up, we allow to disable automatic citation updates (in the Document Preferences). This is essentially what you ask in your last comment. Adding/editing citations in that mode skips step (1), only processes the current citation in (2) and only writes that citation in (3) and should be reasonably fast even on large documents with many citations. The intended workflow with automatic citation updates disabled is that you insert and edit citations all over your document, which is fast, and only use Refresh before needing to share or submit the document, and thus don't need to wait for a slow refresh very often.

    When you first posted we assumed you were working with a Mac. However, it seems that you are on Windows. Windows Word integration is generally very fast and refreshing even large documents with hundreds of citations shouldn't take more than a minute or two, which doesn't seem like what you're reporting.

    So first, could you submit a Report ID from Zotero? Also, what version of OS, and full version of Word are you running? Is this a relatively new computer with good specs? Have you tried restarting your machine? Are you not running other performance-heavy software while refreshing the document with Word?
Sign In or Register to comment.