Why is Abstract embedded in Word field metadata?

edited November 21, 2018
I just noticed that Zotero inserts the full text of the "Abstract" field into the hidden JSON data in a cite field in Word.

My understanding is that Abstract is never used in cites. (I'm not even sure it's available in CSL?)

I understand the whole Zotero entry is generally copied into the field, especially to allow continued usage for already-cited items if not found in the library (for example, while sharing a document), but if Abstract isn't ever used, then it doesn't need to be there.

This isn't particularly harmful, but my guess is that it might contribute to why Word gets slow after many Zotero cites are inserted. At the very least, it will increase the filesize unnecessarily. Note that for many dissertations, for example, the abstract may be several paragraphs, perhaps 500 words or more, and that would be duplicated every time that entry is cited.

Is there any advantage to this feature? Should it be removed?

This isn't an urgent bug or anything, but maybe worth thinking about.

Personally I'm not sure I understand the idea of "Abstract" in general in Zotero. I occasionally find it useful when searching items in my library, or to quickly check what an article is about without opening the article itself. But I rarely use it, have thought about deleting all abstracts from my library to save space-- previously a minor reason, but now if it's always embedded in Word cites, not quite so minor.
  • Abstracts are a standard piece of item metadata, and they are cited in some styles (e.g., annotated bibliographies). Abstracts are a small amount of text data, so deleting the abstracts from your Zotero library won’t aubstantially impact the size of the database or its speed. That abstracts (along with all of the rest of the item data) are included in the embedded Word fields really doesn’t have any impact on the speed of the plugin.
  • OK, that makes sense regarding annotated bibliographies!


    As for size, it's not huge. For example, an average abstract in my library is around 2000 characters long, or 2kb. If there are 100 cites like that (whether repeating the same reference or citing many) in a document, that would add about 200kb to the document. Still not a huge problem, but not insignificant. More relevantly I would worry that Word would struggle some with that much embedded metadata, although I'm not sure how to test it. It's clunky, sometimes buggy with fields anyway, so that can't help.

    But yes, if it's used then I understand why it's there, and more importantly how complex it would be to attempt to 'fix' this, so no problem :)
  • edited November 21, 2018
    The size of fields doesn’t really impact Word much, more just the number of individual fields themselves (e.g., you get the same difficulties if you use a lot of Word-native captions, cross-references, or bookmarks).
  • OK, thanks for clarifying. I'm currently dealing with a 300 page document that's sluggish for various reasons, so just looking for ways to cut corners. (I'm editing in another document, but it still takes a while to put everything back together and refresh the citations.)
  • My main recommendation would be to just not refresh citations until you are done writing.
  • I don't know if @adomasven has done performance tests with and without abstracts for large documents, but 1) abstract is standard item metadata, as bwiernik says, 2) embedded metadata can be used for citing, as discussed above, and 3) it can be used to get metadata back into Zotero from the document with Reference Extractor or a planned future "document collections" feature in Zotero. So if we're embedding metadata at all, the abstract should be there.

    That said, we used to offer the option of not embedding metadata. We now always do, and it might be worth revisiting that to confirm that including metadata doesn't have a significant performance effect in large documents.
  • edited November 21, 2018
    @bwiernik, yes, I agree. I'm testing my formatting (layout of the document, as well as a customized style sheet), but don't plan to do this often. Still it takes about an hour to refresh the whole thing, so even once a month or something is a lot. But I know that's not on the Zotero end, just Word being slow.

    @dstillman, yes, those are good reasons I wasn't thinking about, and the planned collections option sounds useful.
    As for not embedding the data, I'd be curious just to see if there's any way to speed up Word. Not a Zotero issue at all, really, but still a usability challenge.

    You've both answered my questions here, thanks. I was just wondering when I saw all that text hidden there.
  • I haven't tested, there was a report that embedded abstracts trigger a Windows search bug. Not our bug, and doesn't really change the above arguments, but it's another possible consequence of the additional data. (It's not clear that the same bug wouldn't happen with enough regular citations without abstracts.)

    I'd still be curious to see some performance figures with and without abstracts included in a long document. If it turned out that abstracts were dramatically slowing down Word or the plugin, that might change the cost/benefit analysis, particularly if Zotero gets better at updating item metadata and pulling down abstracts (which never exist in data from Crossref anyway).
  • which never exist in data from Crossref anyway
    As an aside FYI, Crossref is hoping & working to change that: https://i4oa.org/

    I agree that abstracts are sufficiently rarely used in citations/bibliographies, that there's a good case to be made for excluding them from embedded metadata if the cause significant issues (performance & otherwise).
  • If someone changed to a style that included abstract, could that be pulled into the embedded data at that point?
  • I don't know the answer for that, but yes, that'd obviously need to work. Dan's answer on the other thread suggests he thinks it'd be possible, but also raises a fair number of other issues.
Sign In or Register to comment.