[MLZ] Sorting problem in Chinese transliteration
Hello,
I am using Multilingual Zotero very succsessfully for my papers. However, I ran into an issue related to sorting Chinese entries in the bibliography.
The bibliography entries are sorted according to the authors' names' transliteration (pinyin) or, in case the name is only known in alphabetic characters, by the normal author field. The Pinyin-transliteration uses tone marks that look like "accents", e.g. ōǒóò, on some characters (aeiouü) to specify their tone. Unfortunately, ǔ got sorted before a and e, and è gets sorted after e.
With chinese pinyin, I would like to sort independently of the used tones. So èéēě should be sorted as if it was only "e".
What I got automatically:
Chǔ Jīnqiáo 楚金桥.
Chang, Hui-Ching,
Chen, Sylvia Xiaohua
Cheng, Simone C. L.
Cheng, Y. H.
Chén Xiàngmíng 陈向明.
What I want:
Chang, Hui-Ching,
Chen, Sylvia Xiaohua
Chén Xiàngmíng 陈向明.
Cheng, Simone C. L.
Cheng, Y. H.
Chǔ Jīnqiáo 楚金桥.
I use following style: https://github.com/j-4/styles/blob/master/vienna-journal-east-asian-studies.csl
Can I specify the sorting method in the style-file?
Thanks for your help!
Isabel
I am using Multilingual Zotero very succsessfully for my papers. However, I ran into an issue related to sorting Chinese entries in the bibliography.
The bibliography entries are sorted according to the authors' names' transliteration (pinyin) or, in case the name is only known in alphabetic characters, by the normal author field. The Pinyin-transliteration uses tone marks that look like "accents", e.g. ōǒóò, on some characters (aeiouü) to specify their tone. Unfortunately, ǔ got sorted before a and e, and è gets sorted after e.
With chinese pinyin, I would like to sort independently of the used tones. So èéēě should be sorted as if it was only "e".
What I got automatically:
Chǔ Jīnqiáo 楚金桥.
Chang, Hui-Ching,
Chen, Sylvia Xiaohua
Cheng, Simone C. L.
Cheng, Y. H.
Chén Xiàngmíng 陈向明.
What I want:
Chang, Hui-Ching,
Chen, Sylvia Xiaohua
Chén Xiàngmíng 陈向明.
Cheng, Simone C. L.
Cheng, Y. H.
Chǔ Jīnqiáo 楚金桥.
I use following style: https://github.com/j-4/styles/blob/master/vienna-journal-east-asian-studies.csl
Can I specify the sorting method in the style-file?
Thanks for your help!
Isabel
I can't reproduce that sort failure with those names here; they come out as desired.
There are a couple of possible causes for the result you're seeing. One is that your platform may have a broken Unicode locale. For a spot-check, try pointing your browser at this page, and let us know what you see:
http://gsl-nagoya-u.net/http/pub/UNICODE-SORT-TEST.html
If the result is not "1", the problem is definitely in your browser. In either case, check the Firefox version (under Help -> About Firefox). There have been improvements in locale sort handling, so it would be best to use the most recent version (v.29).
The other possible problem is that your content may contain non-precomposed characters, which I believe can still cause sorting problems. There is a forum discussion here:
https://forums.zotero.org/discussion/12684/special-character-search/
But we'll cross that bridge only if we have to.
Addendum:
This link suggests that non-precomposed characters may creep into input more easily on Apple systems:
https://developer.apple.com/library/ios/qa/qa1235/_index.html
So if you are on a Mac, this is a possible problem. As the ticket linked in the forum post above has been open for six years, maybe it's time for a solution.
MLZ:
https://github.com/fbennett/zotero/commit/2feca308c2e0980bd7cb95d7e0ee8b50597ae93e
citeproc-js
https://bitbucket.org/fbennett/citeproc-js/commits/d9320700817f2482b9ab8b36bb6b6b336bf90980
In any case, it doesn't look like there's much activity for that bug, so maybe it's not worth worrying about having to redo the DB normalization whenever a new version arrives (and it seems that if the new version comes, it will not override the old version, so we'll have time to migrate).
Edit: I looked more into the changes for the normalization algorithm in 4.1 and, as they say, this should not apply to anything found in meaningful text. The changes to actual decomposition mappings are also quite minimal. So the only differences that would be introduced upon update are compositions for character sets added since 3.2 (I think that's what the current implementation in Firefox is)
Edit 2: after some more reading, it seems that String.prototype.normalize() uses the ICU library, which should support the most up to date version of Unicode. So that's great. I think that's what we should plan to use for normalization.
Thanks a lot for your fast and great help!
I initially used Iceweasel 24.5.0 and the website you pointed me to showed me 0 as a result. Therefore, I downloaded Firefox 29.0.1, which got me the result 1 and where the sorting in ML Zotero is now correct automatically.
It would be fantastic if ML Zotero was integrated into regular Zotero! I was using regular Zotero before, without knowing of the multilingual version, and always had trouble with creating Chinese entries that require characters as well as transliterations. I only found out about ML Zotero through much googling and specific forum posts, but I think that a lot of people still don't know about the great possibilities that are out there!
All the best,
Isabel
Thanks for your kind words about MLZ. The project has kept a low profile in part to avoid a backlash from disappointment over bugs in the early releases. One of the main aims was to get it working for our staff and students, on the assumption that once it proved itself useful, it would begin to percolate onto people's desktops. It's great to see that starting to happen.
MLZ also has pretty-good support for legal referencing, which is another growth area. There have been a lot of code changes against mainstream Zotero to get things working, and migrating them to the main project may have to wait quite awhile; but the prospects for eventual merger and decommissioning of the MLZ variant will rise as the tool finds favour with new users. Meanwhile, local support for our students (who hail from a bunch of jurisdictions whose languages I do not understand) gives me an incentive to support the tool and maintain parity with changes in mainstream Zotero.
Onward and upward. :)