[MLZ] Extended Unicode range characters

Using MLZ on two machines (laptop and desktop) with identical OS, System fonts, FF, etc.

A supplemental B Unicode character U + 25B07 that i wrote into one of my notes on my laptop turned into an empty box on my desktop after sync? Any ideas on why? It keeps creating sync errors on top of that.

I tried to post the character in the forums, but that turned into an forums error. Which leads me to believe that its a server side problem with Zotero. Below is the forum error message.

Technical information (for support personel):

Error Message
An error occurred while creating a new discussion comment.
Affected Elements
CommentManager.SaveComment();

The error occurred on or near: Incorrect string value: '\xF0\xA5\xAC\x87' for column 'Body' at row 1

For additional support documentation, visit the Lussumo Documentation website at: lussumo.com/docs
  • edited June 10, 2013
    How did you enter the character? Is this character on your keyboard or did you enter with alt+? What is the default language setting on both computers? Are they the same? Possibly of less importance but are you _using_ the same font across computers? Zotero "wants" characters encoded in UTF-8. Empty box suggests that the character is not available in the font you are using.

    See:

    http://www.fileformat.info/info/unicode/char/25b07/index.htm
  • Your forum post error is related to the way your system interacted with this Vanilla forum system. Lussumo is the parent company of Vanilla.
  • edited June 10, 2013
    The character is part of multiple fonts on both systems (mingliu, Pmingliu). I used my system's (os X 10.8) CJK input (Zh-traditional). All language settings and fonts on both computers are identical. Encoding is UTF-8. When adding the character (among others) to my zotero note it displays fine. Even after reboot etc, however, it no longer displays on the second machine after sync, turning into the unknown unicode character squares.

    When entering the character into the forums, i just copy + pasted from my note.

    Thanks for the quick responses.
  • @duncdrum: I'll try to run a test with this character in a small mainstream Zotero test instance soon. If we can reproduce the error with mainstream Zotero, Dan can take a look.
  • I can reproduce this. I suspect we just don't properly support 4-byte UTF-8 characters at sync time. They weren't supported until recently in MySQL (which is why the error occurs for the forum), and some other parts of the sync pipeline may not handle them properly either.

    I'll take a look. Ticket created.
  • Thank you so much Dan. I m sure that this will benefit more users venturing beyond latin characters in their work.
  • edited June 11, 2013
    Just to be clear, this isn't about non-Latin characters. Most extended characters work just fine. This is only about 4-byte characters, which include, according to Wikipedia, "less common CJK characters and various historic scripts and mathematical symbols".
  • Dan: Thanks for looking into this and opening the ticket!
  • I m aware that its not all CJK, but the "historic" and "less common" happens more often then one might assume depending on research area.

    Zotero's Multi lingual features are still the best in any bibliography software I know of. Hence I assume that there are more potential and current users with an interest in this.

    Thanks again, it's truly great that Zotero tries to accommodate those quirks. Especially when compared to the look on the face of some of our department's sys-admins when confronted with unicode support and chinese historical documents :)
  • I should mention that said item has come up recently again in a chain of other items that have crashed sync functionality on one of my machines. Could be coincidence, as i said there are more items involved. Just thought i let everyone know.
Sign In or Register to comment.