Citation item with SMP Unicode characters not restored after Word→LibreOffice processor switch
Citation items containing characters from Unicode's Supplementary Multilingual Plane [1] (U+10000–U+1FFFF) are not restored when switching word processors from Word to LibreOffice.
Steps:
1. Open attached in the next post *.docx
2. In Word → Zotero add-on "Document Preferences" → Switch to different word processor
3. Save file as .odt
4. Open in LibreOffice Writer → Zotero add-on "Refresh"
Result:
The item containing SMP characters (Casiraghi et al., 2005) is not restored. Other items without such characters (e.g., Osada et al., 2024) are restored correctly.
Expected:
All citations should be rebuilt regardless of Unicode characters in item metadata.
Environment:
Zotero 8.0.3
Zotero Word for Windows Integration (bundled with Zotero 8.0.3)
Zotero LibreOffice Integration 7.0.6
Microsoft® Word for Microsoft 365 MSO (Version 2601 Build 16.0.19628.20214) 64-bit
LibreOffice 26.2.1.2 (X86_64)
The problematic characters originate from the abstract field of a Physical Review B article (Casiraghi et al., DOI: https://doi.org/10.1103/PhysRevB.72.085401), where the publisher uses Mathematical Italic Unicode characters for variable names.
[1]
https://www.unicode.org/roadmaps/bmp/
https://www.unicode.org/roadmaps/smp/
This report is similar to:
LibreOffice integration BUG - treating 4 preceding characters as part of citation breaks ITEM
Which behavior is recorded on the video:
https://filedn.com/l1AJpXWiF2LRTsLNsykIFE4/112785 Zotero citation eats 4 characters before ITEM/Steps to reproduce Zotero forum bug 112785 2024-03-18 16-35-06.mp4
Steps:
1. Open attached in the next post *.docx
2. In Word → Zotero add-on "Document Preferences" → Switch to different word processor
3. Save file as .odt
4. Open in LibreOffice Writer → Zotero add-on "Refresh"
Result:
The item containing SMP characters (Casiraghi et al., 2005) is not restored. Other items without such characters (e.g., Osada et al., 2024) are restored correctly.
Expected:
All citations should be rebuilt regardless of Unicode characters in item metadata.
Environment:
Zotero 8.0.3
Zotero Word for Windows Integration (bundled with Zotero 8.0.3)
Zotero LibreOffice Integration 7.0.6
Microsoft® Word for Microsoft 365 MSO (Version 2601 Build 16.0.19628.20214) 64-bit
LibreOffice 26.2.1.2 (X86_64)
The problematic characters originate from the abstract field of a Physical Review B article (Casiraghi et al., DOI: https://doi.org/10.1103/PhysRevB.72.085401), where the publisher uses Mathematical Italic Unicode characters for variable names.
[1]
https://www.unicode.org/roadmaps/bmp/
https://www.unicode.org/roadmaps/smp/
This report is similar to:
LibreOffice integration BUG - treating 4 preceding characters as part of citation breaks ITEM
Which behavior is recorded on the video:
https://filedn.com/l1AJpXWiF2LRTsLNsykIFE4/112785 Zotero citation eats 4 characters before ITEM/Steps to reproduce Zotero forum bug 112785 2024-03-18 16-35-06.mp4
Upgrade Storage
Casiraghi (odt to doc).docx
View in Word after "switch to different word processor" action (before save as ODT):
https://s3.amazonaws.com/zotero.org/images/forums/u4845974/cuizd29s6m9zdcmf9ss7.png
View in Writer after restoring citations:
https://s3.amazonaws.com/zotero.org/images/forums/u4845974/4gri4fq5d7qorrjomzrc.png
All report files (including citation items exported as .rdf):
Citation item with SMP Unicode char fail to switch Word to LibreOffice.zip
Citations are restored.
Fix:
ce64ef3(compare 7.0.6→7.0.7)Thank you @dstillman and @adomasven for the prompt fix!
When will this patch be released?