"Abstract" field in citations conflicts with (Windows) file search

Hi all,

ReportID is 1532933208
Though I don't believe any relevant info to the issue would be visible from the report.

I've come across a very particular issue that has taken me quite a while to track down. I'm going to have to go into detail quite a bit, but I'm hoping someone might have a suggestion here.

In short: I've noticed that the inclusion of long abstracts in the code of citations in a word file bars windows search from finding the file, when the search is based on any content of the file that appears áfter those citations.

In full:
I noticed earlier today that particular office Word files were not found by windows search when searching for word strings, even though I was certain these files contained those words. Because I am currently working on a PhD project and have over 2000 files that I'm working with, I needed to be sure I'm not actually overlooking scores of files every time I do a search like this. So, after spending most of my day exploring all kinds of issues with searching for file content in windows, etc., I have now finally tracked down the problem to a conflict with Zotero citations.
To be more specific, all text in a .docx file that appears áfter particular series of Zotero formatted citations can not be found via a windows search. Note that this text cán be found by doing a search within a docx file using Word, but searching for these terms via windows search will never result in finding the files that contain them.

- I followed the 10 steps provided here: https://www.zotero.org/support/kb/debugging_broken_documents and after 1-9 came upon the following results during step 10.

- The issue only occurs when there are numerous citations together, and is not related to particular individual citations. Removing enough of them however (until there are only 2 or 3 in a single string), fixed the issue. Strangely enough, out of a series of citations (ABCDEF) I could remove either ABC or DEF (in most combinations) and it would both fix the issue. Unlinking citations also fixed the issue.

- Checking on this while field codes where made visible, it struck me that the field code contained very lengthy abstracts. (As I understand it, this is part of the useful metadata included in a citation that for example can be retrieved when a file contains references, but is missing the bibliographic database that created the citation?). Removing enough parts of these abstracts (so not the "abstract" field as a whole per se, removing several sentences can be enough) from the code álso fixes the issue (with both old and new files). The removal of particular citations in order to fix the issue as described in the previous step above corresponds with the removal of citations that have a lengthy abstract field in the code.

- This leads me to believe there might be something to do with the length of the field code that creates a conflict with windows search (or the office search filter). In any case, removing abstracts from the field fixed the issue, only to let it pop up again at the next instance in the file where there are numerous citation placed together. To make the document findable in windows based on content, it now effectively requires manually removing all abstracts data.

- This issue can also be reproduced by creating a new Word file, and entering enough citations (3 or 4) as long as they have lengthy abstracts in the field codes. (This is testable by typing a word after the string of citations and seeing whether search picks up the file when searching for that word, and under what conditions).

Additional notes:
- The issue is the same when using other 3rd party search programs. This might be because they make use of the same filter as windows search....
- ...which is the standard 'office open xml format word filter' provided by Office.
- Before coming across the Zotero issue, I had already done a very thorough job of reinstalling windows search, repairing Office, etc., etc. As noted, the issue is very particular and can be recreated only under these very specific conditions.

I have an endless list of more details I could give you, but I'm gonna guess this is already quite a lot. If you have any inclination as to how I might be able to deal with this I'd of course be happy to provide more info. From searching this forum I understand that turning off the inclusion of abstracts in the field is not really an option. However, I'm hoping to find a way to make file search, based on content, work for all my research files without having to manually check and remove all abstracts individually or unlinking all citations. So any potential solutions that might work for all, or batches of, files or citations at once would really help me out.

Kind regards, J.

  • This would be a Windows/Word issue, not a Zotero issue. Zotero inserts standard Word fields. You'd have to report it to Microsoft.
  • We could consider adding a hidden preference not to include abstracts in embedded metadata, but that's not really something people should have to understand or think about, it would be hard to explain the implications (if you or someone you shared the document with later tried to extract metadata, the abstracts would be missing; citation styles that explicitly used the abstract wouldn't work), and it'd be easy to forget that you had it on.

    It sounds like you may have seen it already, but https://forums.zotero.org/discussion/74699/why-is-abstract-embedded-in-word-field-metadata/p1 has some further discussion on this.
  • Thanks for the response! I'll try to see if contacting Microsoft gets me anywhere.

    Yeah the reasons for including the metadata are perfectly sound of course, and I definitely get trying to keep it as user-friendly as possible given that the program's ease of use is exactly why many people use it.
    Anyway, I'll see if I get any response from Microsoft, and in case they point to any potentially relevant issue rather than just a fluke I'll post their response.
Sign In or Register to comment.