unicode, sorting, names

In looking around to see if there is an easy fix for the current bug with sorting extended unicode characters (nothing definitive, but seems you need a user-defined collation algorithm; I would have thought there'd be a dafault one though), I ran across this thread on names:

<http://groups.google.com/group/comp.lang.python/browse_frm/thread/694d57f997bd4ce5>;

They bring out a lot of practical examples that are going to trip Zotero up (even with pretty standard Western names; for example, that in some European countries, one sorts on "von" and "van" and in others not).
  • There are two processes that are used by Library cataloguers associated with the sorting of bibliographic data by name and subject.

    The first is the ‘standardisation’ of the names and subjects through a process of applying a list of somewhat arbitrary rules to modify them for sorting.

    The second is the alphabetical sorting process which is subject to the rules of the language and character set used.

    The list of rules used for the pre-sorting process is complex and some of them I would imagine be difficult to reliably automate. These rules are language specific. And I might guess country or even institution specific. In the examples I quoted in the reference below, some the rules might even be offensive to some people (i.e. The rule for treating “R. Academia nazionale dei Lincei, Rome” is “Ignore foreign royalty (except British)”)

    So proper treatment of name and subject sorting requires, at least national pre-sort processing modules, and national language sorting.

    If the Zotero developers do not want to deal with this complex task, then they need to provide a extra fields for sorting-name and sorting-subject so that the user can apply these pre-sorting rules manually when needed. Or provide another mechanism for the user to influence sorting order.

    I have put up some examples of the pre-sorting process on - http://wiki.services.openoffice.org/wiki/Name_Sorting
Sign In or Register to comment.