Ignore certain words when sorting

When sorting lists (in English anyway) it is common to ignore definite and indefinite articles - the, an, a.

Such an option for sorting bibliographies would be valuable in Zotero. A style guide I am working with at the moment requires this, as well as requiring that anonymous and certain other works be listed by title rather than author, so the occurrence of 'The' at the beginning of a bibliography entry is not uncommon.

One way to approach this which would

a) not require embedding a lot of l18n information in the processor and
b) give style writers plenty of flexibility

would be to add an additional attribute to either the <sort> or <key> element in CSL of ignore-leading-words.

On the down-side of course this would require a change the CSL specification
  • edited October 8, 2012
    (as a note Chicago Manual requires this, too, as a secondary sort condition - ie. "Smith, Adam, An Inquiry into the Nature and Causes of the Wealth of Nations" should get sorted after "Smith, Adam, The Glasgow Edition")
  • @adamsmith, does CMoS discuss title casing? The CSL spec lists the words that shouldn't be title cased ( http://citationstyles.org/downloads/specification.html#title-case-conversion ), and I'm wondering if a single custom list would suffice to set both title case and sorting exceptions.

    Related, Frank already played around with a "skip-words" attribute to define words that shouldn't be title cased or taken into account when sorting: http://gsl-nagoya-u.net/http/pub/citeproc-js-csl.html#skip-words-attribute-to-style-options
  • here's CMoS on capitalization:
    Capitalize the first and last words in titles and subtitles (but see rule 7), and capitalize all other major words (nouns, pronouns, verbs, adjectives, adverbs, and some conjunctions—but see rule 4).

    Lowercase the articles the, a, and an.

    Lowercase prepositions, regardless of length, except when they are used adverbially or adjectivally (up in Look Up, down in Turn Down, on in The On Button, to in Come To, etc.) or when they compose part of a Latin expression used adjectivally or adverbially (De Facto, In Vitro, etc.).

    Lowercase the conjunctions and, but, for, or, and nor.

    Lowercase to not only as a preposition (rule 3) but also as part of an infinitive (to Run, to Hide, etc.), and lowercase as in any grammatical function.

    Lowercase the part of a proper name that would be lowercased in text, such as de or von.

    Lowercase the second part of a species name, such as fulvescens in Acipenser fulvescens, even if it is the last word in a title or subtitle.
    (CMoS 8.157)

    Non-English titles stay in their common language. But that list doesn't serve for sorting - e.g. For Whom the Bells Toll should be sorted under F, not W.

    Sorting is much simpler, though:
    An initial the, a, or an is ignored in the alphabetizing.
    (CMoS 14.67) That's the total extent of the inclusion list.
    AFAIK CMoS does not talk about sorting non-English titles - so I don't know if Die Blechtrommel should be under D or under B - though foreign articles are mentioned with respect to indexes (CMoS 16.52). There, the answer is "it depends":
    In publications intended for a general audience, especially those that mention only a few such titles, it is acceptable to list the titles in the index exactly as they appear in the text, without inversion and alphabetized according to the article.
    Eine kleine Nachtmusik (Mozart), 23
    La bohème (Puccini), 211

    In a more specialized work, or any work intended for readers who are likely to be well versed in the languages of any foreign titles mentioned in the text, the titles may be inverted as they are in English (see 16.51). (...) [T]he articles are ignored in alphabetizing.
  • I see this has been dormant for a couple years. Would it be acceptable to add a special case in the Chicago CSL for "the," "a, and "an," as specified in CMoS 16, 14.67?

    And if so, how would one go about doing that? I've never tinkered with CSL before, but it looks like one could create a macro for "sort-title" that strips off those words, then use it instead of "title" in this block under <bibliography>:

    <sort>
        <key macro="contributors-sort"/>
        <key variable="title"/>
        <key variable="genre"/>
        <key variable="issued"/>
    </sort>
  • you can't do this in CSL, this would have to be done in the processor, which is one of the reasons its been taking a long time.
  • Thanks for the explanation, and I'm glad to hear it's on the radar. Is there any way I can help? (For instance, by testing code or anything?) I didn't see any relevant issues on Git.
  • citeproc-js, the citation processor, is a project separate from Zotero (run - and pretty much exclusively maintained - by fbennett), you wouldn't find anything about that on Zotero's github. I don't think there's code in place that needs testing. I believe it would also be pretty straightforward in terms of the actual code, so testing isn't the main issue, but rather the logistical details of implementation (e.g. I'd guess that we need to make this an option in citation styles, since not all styles have the same rules on sorting - those types of things usually take time because everyone has to agree on how that should look exactly).
  • MLA is like CMoS as far as sorting titles goes ("a" and "the" at the beginning of a title should be ignored) and so would also benefit by a change to the citation processor.
Sign In or Register to comment.