Title case regression in 2.1 (or csl 1.0)

See here: http://forums.zotero.org/discussion/16942/apa-6-capitalization-errors-in-bibliography/#Item_10

In 2.0.9 Zotero's title case would render correctly for MHRA (for example). In 2.1.1, it seems to have regressed - words such as 'in', 'on', 'or' and I'm sure many more are now capitalized (incorrectly). For me, Zotero still correctly keeps "and" and "the" in lowercase, the user in the thread above reports seeing all uppercase.
What's going on?
  • It looks like citeproc-js only leaves "a", "an", and "the" in lowercase. We can easily change this. The old list was:

    "but", "or", "yet", "so", "for", "and", "nor", "a", "an", "the", "at", "by", "from", "in", "into", "of", "on", "to", "with", "up", "down", "as"

    Frank, is there a reason for the change? Should we make this a preference?
  • Aha. I was working on a post, but was called away from my desk before hitting the send button. The text is below -- but the short answer is that there was no reason for the change that I can remember, and restoring the status quo ante is probably the way to go.

    Any votes for "onto", "over", "till", or "ago"? If not, I'll just top up to reflect the list from 2.0.9.

    ***

    Confirmed. This function was adapted from Zotero 2.0.9 at a very early stage in the development of the new processor, and I must have forgotten or brushed aside this difference in the implementation.

    The new code uses an explicit list of skip-words, which does work -- the list is just too limited at the moment. Looking forward, using an explicit list has the advantage that, with references tagged for language, we will be able to adapt title-case behavior to the language of individual entries.

    The question would be what words should be lowercased in English. How does this look for a tentative list?
    "a", "the", "an", "and", "but", "for", "or", "nor", "to", "on", "in", "at", "ago", "till", "by", "with", "over", "into", "onto", "from"
    Here are a couple of sources that might be useful for reference:

    http://www.suite101.com/content/how-to-write-in-title-case-a73866

    http://en.wikipedia.org/wiki/List_of_English_prepositions
  • edited March 23, 2011
    Hmm. As your first link points out, the CMoS guidelines are pretty aggressive (all prepositions lowercase regardless of length). I think your list is probably more correct than the original list, but the Wikipedia page calls "ago" a "postposition" and it seems like it should probably still be capitalized (although it usually comes at the end of a title anyway, so it probably doesn't make a big difference). "of" and "via" should probably be on the list as well. It might be worthwhile to discuss this further on xbiblio at some point.
  • as a reminder here the CMoS rules:
    1. Capitalize the first and last words in titles and subtitles (but see rule 7), and capitalize all other major words (nouns, pronouns, verbs, adjectives, adverbs, and some conjunctions—but see rule 4).
    2. Lowercase the articles the, a, and an.
    3. Lowercase prepositions, regardless of length, except when they are used adverbially or adjectivally (up in Look Up, down in Turn Down, on in The On Button, to in Come To, etc.) or when they compose part of a Latin expression used adjectivally or adverbially (De Facto, In Vitro, etc.).
    4. Lowercase the conjunctions and, but, for, or, and nor.
    5. Lowercase to not only as a preposition (rule 3) but also as part of an infinitive (to Run, to Hide, etc.), and lowercase as in any grammatical function.
    6. Lowercase the part of a proper name that would be lowercased in text, such as de or von.
    7. Lowercase the second part of a species name, such as fulvescens in Acipenser fulvescens, even if it is the last word in a title or subtitle.


    Obviously we can forget about 7 - 6 actually seems doable - with 'von, van, de" we should have most covered - question is if that's worth it.

    Of Frank's example's, over might actually be a bad idea - "The War is Over" should be capitalized e.g. the others we can include - looking through the list of prepositions on WP nothing else seems urgent.
  • The algorithm capitalizes the last word in a title, so that example should be fine even if we add "over" to the list.
  • @adamsmith: 6 is tricky. At least for Dutch names, the first particle should only be lowercased if preceded by a given name or initial, e.g.:

    "The Life of Abraham van Helsing"
    "The Life of Van Helsing"
    (see http://en.wikipedia.org/wiki/Abraham_Van_Helsing#Name)
  • I've included an extended list in a revised version of the processor, just checked in and pushed.

    @Rintze: If the particle is capitalized in input, the title-case function won't touch it, so it might be safe to add these to the list as well.
  • Very timely, just what I was looking for...

    Unfortunately, there seems to be a problem with the function that applies title case too (in CSL.Output.Formatters.title I guess).

    The problem is that the function is too aggressive, capturing words that begin with any of the words in the skip list. For example, "institution" is lower-cased because of "in". To test this, I added "mu" to the list, and sure enough, "multipolarity" is promptly lower-cased. This is in Zotero 2.1.1.

    Perhaps someone else can confirm this?
  • edited March 29, 2011
    I can't reproduce this error. In the function at CSL.Output.Formatters.title(), the skipword is identified in the string (split on spaces into a Array object) with indexOf(), which performs a strict match. If you're really seeing this behavior, we'll need more detail on your system and on the style and input that produce it.

    (Edit: Hold that thought. Looking at the code again, indexOf() is indeed being applied to a string. I'll have another go at reproducing this bug.)

    (Edit: Nope, further testing and a careful reading of the code don't turn up any issues here. My explanation above is not correct, however. In the function, indexOf() is used to identify the position of the substring inside the "word". The surrounding characters are then checked for punctuation, and the test fails if any roman letters are found in the remaining elements. By the logic, this should not match "institution", and testing both in the processor test framework and in Zotero show it failing here, resulting in capitalization of "Institution" when text-case="title" is applied. The conclusion is the same: If you're really seeing this behavior, we'll need more detail on your system and on the style and input that produce it.)
  • I tested the citeproc.js included in 2.1.1 again, in isolation. It's not actually that it filters whole words based on the skip_words -- it seems to be a case of incorrect syntax for slice()? I just tested your revision 1809 and it works as it should.

    Comparing the two functions, it comes down to this:
    2.1.1 (line 8218):
    tmp = lowerCaseVariant.slice(0, idx, idx + lowerCaseVariant.slice(skipword.length));

    r1809 (8124):
    tmp = lowerCaseVariant.slice(0, idx) + lowerCaseVariant.slice(idx + skipword.length);

    That causes some weird behaviour in 2.1.1 (running in FF4), with the default skip_words (a, the, an): "Butternut Squash Island, an agonizingly attractive theme Park, Is Normally By-passed By Insane Fromage Lovers"

    Unformatted: "Butternut Squash Island, an agonizingly attractive theme park, is normally by-passed by insane fromage lovers"

    So you've already fixed it. Sorry to waste your time with such a trivial problem, but at least you'll know if someone else runs across it.

    ---
    Demonstrations
    Z.2.1.1: http://laurelindon.com/files/misc/citeproc/citeproc.html
    r1809: http://laurelindon.com/files/misc/citeproc/citeproc.1809.html
  • No worries; I wasn't as careful about reviewing the changes for the post as I ought to have been. Very glad to hear that it's working.
  • A few small problems I've noticed with title case processing (Zotero 2.1.2):

    "UK" becomes "Uk".
    Author initials - "Review of Book by A.N. Author" becomes "Review of Book by A.n. Author"
    If a preposition follows a period in a title, Zotero wrongly leaves it uncapitalized.
  • That's not good. If letters are capitalized in the input, they shouldn't be touched. I'll have a look. Thanks for reporting this.
  • edited April 1, 2011
    Fixed, in a processor version that will make its way into the next Zotero 2.1 release. I haven't done anything special for prepositions following a period; I think that can most safely be left to the (newly added) generic mechanism for leaving words with capital letters untouched. Forcing a word in that position to upper case would lead us into a wilderness of semantic processing from which few projects emerge unscathed.
  • Sorry for the tangent: could Transform Text functionality be made available not just for titles, but also for other text fields, e.g., abstracts, publication titles, institutions?
  • Has this made it into Zotero yet? If not, how can I know when it does? Thanks!
  • I still need to revisit this. Thanks for the reminder.
  • edited May 30, 2011
    The issues raised here all seem to have been fixed in the current processor, with the small exception of prepositions following a period, which for the time being will not be addressed.

    Edited to correct a recommendation that was in error. As adamsmith has stated elsewhere:
    For that reason we recommend that you store all your titles in sentence case - converting to title case automatically is easy and done for styles that require it.
  • No, that can't be right. Sentence casing isn't reliable at all (proper nouns, especially) and all style we have use title case and not sentence case and so far the recommendation has always been to enter sentence case.
Sign In or Register to comment.