Title case regression in 2.1 (or csl 1.0)
See here: http://forums.zotero.org/discussion/16942/apa-6-capitalization-errors-in-bibliography/#Item_10
In 2.0.9 Zotero's title case would render correctly for MHRA (for example). In 2.1.1, it seems to have regressed - words such as 'in', 'on', 'or' and I'm sure many more are now capitalized (incorrectly). For me, Zotero still correctly keeps "and" and "the" in lowercase, the user in the thread above reports seeing all uppercase.
What's going on?
In 2.0.9 Zotero's title case would render correctly for MHRA (for example). In 2.1.1, it seems to have regressed - words such as 'in', 'on', 'or' and I'm sure many more are now capitalized (incorrectly). For me, Zotero still correctly keeps "and" and "the" in lowercase, the user in the thread above reports seeing all uppercase.
What's going on?
"but", "or", "yet", "so", "for", "and", "nor", "a", "an", "the", "at", "by", "from", "in", "into", "of", "on", "to", "with", "up", "down", "as"
Frank, is there a reason for the change? Should we make this a preference?
Any votes for "onto", "over", "till", or "ago"? If not, I'll just top up to reflect the list from 2.0.9.
***
Confirmed. This function was adapted from Zotero 2.0.9 at a very early stage in the development of the new processor, and I must have forgotten or brushed aside this difference in the implementation.
The new code uses an explicit list of skip-words, which does work -- the list is just too limited at the moment. Looking forward, using an explicit list has the advantage that, with references tagged for language, we will be able to adapt title-case behavior to the language of individual entries.
The question would be what words should be lowercased in English. How does this look for a tentative list? Here are a couple of sources that might be useful for reference:
http://www.suite101.com/content/how-to-write-in-title-case-a73866
http://en.wikipedia.org/wiki/List_of_English_prepositions
1. Capitalize the first and last words in titles and subtitles (but see rule 7), and capitalize all other major words (nouns, pronouns, verbs, adjectives, adverbs, and some conjunctions—but see rule 4).
2. Lowercase the articles the, a, and an.
3. Lowercase prepositions, regardless of length, except when they are used adverbially or adjectivally (up in Look Up, down in Turn Down, on in The On Button, to in Come To, etc.) or when they compose part of a Latin expression used adjectivally or adverbially (De Facto, In Vitro, etc.).
4. Lowercase the conjunctions and, but, for, or, and nor.
5. Lowercase to not only as a preposition (rule 3) but also as part of an infinitive (to Run, to Hide, etc.), and lowercase as in any grammatical function.
6. Lowercase the part of a proper name that would be lowercased in text, such as de or von.
7. Lowercase the second part of a species name, such as fulvescens in Acipenser fulvescens, even if it is the last word in a title or subtitle.
Obviously we can forget about 7 - 6 actually seems doable - with 'von, van, de" we should have most covered - question is if that's worth it.
Of Frank's example's, over might actually be a bad idea - "The War is Over" should be capitalized e.g. the others we can include - looking through the list of prepositions on WP nothing else seems urgent.
"The Life of Abraham van Helsing"
"The Life of Van Helsing"
(see http://en.wikipedia.org/wiki/Abraham_Van_Helsing#Name)
@Rintze: If the particle is capitalized in input, the title-case function won't touch it, so it might be safe to add these to the list as well.
Unfortunately, there seems to be a problem with the function that applies title case too (in CSL.Output.Formatters.title I guess).
The problem is that the function is too aggressive, capturing words that begin with any of the words in the skip list. For example, "institution" is lower-cased because of "in". To test this, I added "mu" to the list, and sure enough, "multipolarity" is promptly lower-cased. This is in Zotero 2.1.1.
Perhaps someone else can confirm this?
(Edit: Hold that thought. Looking at the code again, indexOf() is indeed being applied to a string. I'll have another go at reproducing this bug.)
(Edit: Nope, further testing and a careful reading of the code don't turn up any issues here. My explanation above is not correct, however. In the function, indexOf() is used to identify the position of the substring inside the "word". The surrounding characters are then checked for punctuation, and the test fails if any roman letters are found in the remaining elements. By the logic, this should not match "institution", and testing both in the processor test framework and in Zotero show it failing here, resulting in capitalization of "Institution" when text-case="title" is applied. The conclusion is the same: If you're really seeing this behavior, we'll need more detail on your system and on the style and input that produce it.)
Comparing the two functions, it comes down to this:
2.1.1 (line 8218):
tmp = lowerCaseVariant.slice(0, idx, idx + lowerCaseVariant.slice(skipword.length));
r1809 (8124):
tmp = lowerCaseVariant.slice(0, idx) + lowerCaseVariant.slice(idx + skipword.length);
That causes some weird behaviour in 2.1.1 (running in FF4), with the default skip_words (a, the, an): "Butternut Squash Island, an agonizingly attractive theme Park, Is Normally By-passed By Insane Fromage Lovers"
Unformatted: "Butternut Squash Island, an agonizingly attractive theme park, is normally by-passed by insane fromage lovers"
So you've already fixed it. Sorry to waste your time with such a trivial problem, but at least you'll know if someone else runs across it.
---
Demonstrations
Z.2.1.1: http://laurelindon.com/files/misc/citeproc/citeproc.html
r1809: http://laurelindon.com/files/misc/citeproc/citeproc.1809.html
"UK" becomes "Uk".
Author initials - "Review of Book by A.N. Author" becomes "Review of Book by A.n. Author"
If a preposition follows a period in a title, Zotero wrongly leaves it uncapitalized.
Edited to correct a recommendation that was in error. As adamsmith has stated elsewhere: