odd behaviour when sorting in danish

I was doing some testing of "University College Lillebælt - Harvard (Danish)", and noticed that "A Room with a View" (as a movie) came up last in the reference list, though it should come first.

If I change the language to "en-UK" it sorts correctly. So it seems like something is wrong with the special sorting rules for Danish. Can anyone explain what is going wrong?
  • Can't think of anything processor-related that would cause it to do that.
  • I ran some tests, changing the default-locale in the style element (and nothing else).

    When the default locale is set to danish or norwegian (da-DK, nn-NO or nb-NO) the reference list looks like this:


    Ab Delrahman, H., 2012. Det danske alfabet er sært. Kbh: Politiken.

    B Gjengen, 2009. Alfabeta. Kbh.

    Juul Jensen, U., Hædersdal, C. & Skadborg, M.K., 2014. Etablering af klinisk etiske komitéer i Danmark: en praksis-filosofisk tilgang til klinisk etik fra bibliotek.dk, Bibliotek for læger, årg. 206, nr. 1, s. 44–61.

    A er det første bogstav i alfabetet, 2011. DR.

    A Hansen, M., 2012. Alfabetiseringsproblemer. København: Gyldendal.

    A kasernes fællesvirke, 2015. Problemer med alfabetet. Kbh: Dafolo.



    But when the default locale is set to english, french, german or swedish (en-US, fr-FR, de-DE, sv-SE), it looks like this:


    A er det første bogstav i alfabetet, 2011. DR.

    A Hansen, M., 2012. Alfabetiseringsproblemer. København: Gyldendal.

    A kasernes fællesvirke, 2015. Problemer med alfabetet. Kbh: Dafolo.

    Ab Delrahman, H., 2012. Det danske alfabet er sært. Kbh: Politiken.

    B Gjengen, 2009. Alfabeta. Kbh.

    Juul Jensen, U., Hædersdal, C. & Skadborg, M.K., 2014. Etablering af klinisk etiske komitéer i Danmark: en praksis-filosofisk tilgang til klinisk etik fra bibliotek.dk, Bibliotek for læger, vol. 206, no. 1, pp. 44–61.
  • I tried the selection of styles that are automatically installed with Zotero, and American Anthropological Associaton, Cell, Elsevier Harvard, and Modern Humanities Research Association (all of which have default locale set to en-US or en-GB) all sorts correctly. But APA, Chicago Manual of Styles and Modern Language Association (neither of which have a default locale) all sorts incorrectly, with the singular 'A' coming last.
  • Hmm. It's hard to imagine that Firefox locale sorting could be this messed up without a flood of complaints, so maybe this is a processor bug after all.

    I'll take a look as soon as I have time, and post back.
  • I tried with some simple entries using a minimal style, and I wasn't able to reproduce the fault (under Linux) in Zotero or in Juris-M, so we'll need to dig a little deeper.

    Can you export a small set of entries (as RIS, Zotero RDF, or CSL-JSON), paste the exported data to https://gist.github.com, save it as a "gist", and post the URL back to this thread?
  • I don't understand why you would need the entries. But here's the six entries from the example above:

    https://gist.github.com/roaldfrosig/28cf8178b99efa7954dba62d2796b7de

    Thanks for looking into this.
  • edited June 16, 2016
    The entries are useful as test data, because I was unable to reproduce the fault. It saves me the trouble of guessing which portions of each cite are in which Zotero field, and how they are composed.

    Two pieces of news.

    The first news item is that I am able to reproduce this now, under Linux, in both Zotero for Firefox and in Juris-M for Firefox, using a copy of chicago-fullnote-bibliography.csl. The bad sort manifests both when Danish is selected as the style language in the word processor plugin UI, and when the style is pegged to Danish with default-locale="da-DK".

    The second news item is that it does not fail with a copy of the same style set in a processor test fixture, processed with any of the following (slightly dated) JS engines:
    Rhino 1.7
    MozJS C24.2.0
    jsc (recently compiled, version unclear)


    It is possible for a calling application to override the native sort method in citeproc-js, and I think Zotero (or maybe just MLZ?) used to do this, but I don't see any evidence of it in the current source code. As far as I can tell, the processor is being allowed to run normally.

    I can think of three possible causes, all of which would be connected to changes in the Firefox JavaScript engine (which did change fairly recently):

    (1) The behaviour of localeCompare() may have changed for the worse. This seems unlikely, but it is at least a possible cause.

    (2) The API of localeCompare() has changed, and the processor code requires adjustment. This also seems unlikely, but it will be something to check.

    (3) Sort-relevant pre-processing methods in the processor are failing due to known changes to Firefox JavaScript (a feature of es6 implementations generally). The culprit that first comes to mind is array comprehensions. This is the first thing I will look for.

    That's the news for now. I have a mountain of other work today, but I'll try to look into this sometime during the weekend.
  • News. There seems to be no difference in the sort keys generated, so (3) is out. We're looking at a change in locale sort behaviour, which may or may not be tunable via parameters to the localeCompare() method.

    As one data point, here is a sort-key comparison that returns different values in the en-US and da-DK locales:
    AbADelrahman0Hussein :: AAHansen0Martin
    The inserted characters "A" and "0" are hacked in to produce similar sort results across different JavaScript implementations, which are all over the place in their treatment of spaces. (Probably the "0" inserted between name elements should be replaced with "0" for consistency, but I can't see how this should make a difference across locales.)

    When compared with localeCompare(), this comparison returns 1 in the en-US locale, and -1 in the da-DK locale. I have no idea why that would be the case. I suppose we could force the use of "en-US" locale when "da-DK" etc are requested, but that would be layering hacks over hacks, and would probably end unhappily at some point.

    In other words, I'm kind of stuck here.
  • edited June 18, 2016
    My guess is the inserted “A” is causing the trouble: in Danish (and Norwegian) Aa is treated like Å in alphabetical sorting.

    See https://en.wikipedia.org/wiki/Danish_and_Norwegian_alphabet#History
  • That's true. In danish dictionaries you will commonly find words beginning with "aa" on the last pages.

    I had no idea that the sort mechanism will replace spaces with characters, otherwise I might have guessed that the special treatment of "aa" in the danish alphabet would be at the reason behind this problem.

    Anyway. It seems like you pinpointed what's going wrong. But can it be fixed?
  • Thanks @nickbart for that pointer.

    Hacks are necessary, unfortunately, because locales behave differently across JS engines: it's the only way to at least avoid very bad sort results in some environments.

    After some frustrating experiments, I've come up with a solution that I think will hold together reasonably well. You can try it out by installing one of the Propachi plugins. If it works, you can uninstall the plugin at the next Zotero upgrade.
  • Thank you very much fbennet.
Sign In or Register to comment.