odd behaviour when sorting in danish
I was doing some testing of "University College Lillebælt - Harvard (Danish)", and noticed that "A Room with a View" (as a movie) came up last in the reference list, though it should come first.
If I change the language to "en-UK" it sorts correctly. So it seems like something is wrong with the special sorting rules for Danish. Can anyone explain what is going wrong?
If I change the language to "en-UK" it sorts correctly. So it seems like something is wrong with the special sorting rules for Danish. Can anyone explain what is going wrong?
When the default locale is set to danish or norwegian (da-DK, nn-NO or nb-NO) the reference list looks like this:
Ab Delrahman, H., 2012. Det danske alfabet er sært. Kbh: Politiken.
B Gjengen, 2009. Alfabeta. Kbh.
Juul Jensen, U., Hædersdal, C. & Skadborg, M.K., 2014. Etablering af klinisk etiske komitéer i Danmark: en praksis-filosofisk tilgang til klinisk etik fra bibliotek.dk, Bibliotek for læger, årg. 206, nr. 1, s. 44–61.
A er det første bogstav i alfabetet, 2011. DR.
A Hansen, M., 2012. Alfabetiseringsproblemer. København: Gyldendal.
A kasernes fællesvirke, 2015. Problemer med alfabetet. Kbh: Dafolo.
But when the default locale is set to english, french, german or swedish (en-US, fr-FR, de-DE, sv-SE), it looks like this:
A er det første bogstav i alfabetet, 2011. DR.
A Hansen, M., 2012. Alfabetiseringsproblemer. København: Gyldendal.
A kasernes fællesvirke, 2015. Problemer med alfabetet. Kbh: Dafolo.
Ab Delrahman, H., 2012. Det danske alfabet er sært. Kbh: Politiken.
B Gjengen, 2009. Alfabeta. Kbh.
Juul Jensen, U., Hædersdal, C. & Skadborg, M.K., 2014. Etablering af klinisk etiske komitéer i Danmark: en praksis-filosofisk tilgang til klinisk etik fra bibliotek.dk, Bibliotek for læger, vol. 206, no. 1, pp. 44–61.
I'll take a look as soon as I have time, and post back.
Can you export a small set of entries (as RIS, Zotero RDF, or CSL-JSON), paste the exported data to https://gist.github.com, save it as a "gist", and post the URL back to this thread?
https://gist.github.com/roaldfrosig/28cf8178b99efa7954dba62d2796b7de
Thanks for looking into this.
Two pieces of news.
The first news item is that I am able to reproduce this now, under Linux, in both Zotero for Firefox and in Juris-M for Firefox, using a copy of chicago-fullnote-bibliography.csl. The bad sort manifests both when Danish is selected as the style language in the word processor plugin UI, and when the style is pegged to Danish with default-locale="da-DK".
The second news item is that it does not fail with a copy of the same style set in a processor test fixture, processed with any of the following (slightly dated) JS engines:
Rhino 1.7
MozJS C24.2.0
jsc (recently compiled, version unclear)
It is possible for a calling application to override the native sort method in citeproc-js, and I think Zotero (or maybe just MLZ?) used to do this, but I don't see any evidence of it in the current source code. As far as I can tell, the processor is being allowed to run normally.
I can think of three possible causes, all of which would be connected to changes in the Firefox JavaScript engine (which did change fairly recently):
(1) The behaviour of localeCompare() may have changed for the worse. This seems unlikely, but it is at least a possible cause.
(2) The API of localeCompare() has changed, and the processor code requires adjustment. This also seems unlikely, but it will be something to check.
(3) Sort-relevant pre-processing methods in the processor are failing due to known changes to Firefox JavaScript (a feature of es6 implementations generally). The culprit that first comes to mind is array comprehensions. This is the first thing I will look for.
That's the news for now. I have a mountain of other work today, but I'll try to look into this sometime during the weekend.
As one data point, here is a sort-key comparison that returns different values in the en-US and da-DK locales:
AbADelrahman0Hussein :: AAHansen0Martin
The inserted characters "A" and "0" are hacked in to produce similar sort results across different JavaScript implementations, which are all over the place in their treatment of spaces. (Probably the "0" inserted between name elements should be replaced with "0" for consistency, but I can't see how this should make a difference across locales.)
When compared with localeCompare(), this comparison returns 1 in the en-US locale, and -1 in the da-DK locale. I have no idea why that would be the case. I suppose we could force the use of "en-US" locale when "da-DK" etc are requested, but that would be layering hacks over hacks, and would probably end unhappily at some point.
In other words, I'm kind of stuck here.
See https://en.wikipedia.org/wiki/Danish_and_Norwegian_alphabet#History
I had no idea that the sort mechanism will replace spaces with characters, otherwise I might have guessed that the special treatment of "aa" in the danish alphabet would be at the reason behind this problem.
Anyway. It seems like you pinpointed what's going wrong. But can it be fixed?
Hacks are necessary, unfortunately, because locales behave differently across JS engines: it's the only way to at least avoid very bad sort results in some environments.
After some frustrating experiments, I've come up with a solution that I think will hold together reasonably well. You can try it out by installing one of the Propachi plugins. If it works, you can uninstall the plugin at the next Zotero upgrade.