Title case for non-English titles: call for info
A recent thread concerning MLA raises an issue affecting every style that uses text-case="title" (38 of them currently, in the repository).
The problem is that some languages have no concept of "title case"; so when the title is written in such a language, the conversion should not take place. Ever. The title should be rendered exactly as it appears in the Zotero database.
For "polyglot" styles that are likely to be used to format references from several languages in a single document, this is a problem.
Before thinking about fixes or enhancements, I have a question:
The problem is that some languages have no concept of "title case"; so when the title is written in such a language, the conversion should not take place. Ever. The title should be rendered exactly as it appears in the Zotero database.
For "polyglot" styles that are likely to be used to format references from several languages in a single document, this is a problem.
Before thinking about fixes or enhancements, I have a question:
It's a short questionnaire and a big community: I'll look forward to learning the answer. :)Does any language other than English have special capitalization rules that apply specifically to titles?
http://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/msg01473.html
No for German and Spanish from personal knowledge. The respective Wikipedia entries in Spanish and English both point to this as a specific English (esp. AE) thing.
http://en.wikipedia.org/wiki/Letter_case#Headings_and_publication_titles
http://es.wikipedia.org/wiki/May%C3%BAscula#En_otros_idiomas
In Portuguese title case exists in theory, but no one (and no relevant style guide) uses it
http://pt.wikipedia.org/wiki/Caixa_alta#T.C3.ADtulos_em_artigos_e_em_cita.C3.A7.C3.A3o_bibliogr.C3.A1fica
Looks like you're the voice of the community!
There's really nothing to be done for this without access to the language field of the target item. Assuming that it will eventually be passed through, I'll set up the processor to disable title-case conversion for any item that (a) has a valid language tag in the language field, which (b) is not from the "en" domain.
For people working with material in mixed languages, it will be possible to pass a language tag through to the CSL processor using the "cheater" syntax: {:language: de} [Edit: in the Extra field]. Let's hope that the mapping can be added in the near future.
(Edit: Simon has added a mapping for the Language field on the Zotero trunk. When that comes through in a stable release, there will be no need to use the "cheater syntax" described above. That syntax is intended only for testing purposes, could be withdrawn at any time, and should not be used in production data.)
@Gracile - well, because you can't test for the content of a variable that will actually still not be possible.
Here's what Frank did:
"In the next revision of citeproc-js (1.0.183), if the item variable "language" begins with a two-character language code that is not "en", the title-case="title" transform will be suppressed, to prevent undesired transforms of non-English content. Mapping this field through will address the issue discussed here: ..."
What Frank has implemented in citeproc-js 1.0.183 is no more than a (great) workaround.
The processor will recognize (via an experimental extension that is not valid CSL) entirely separate cs:layout nodes for separate languages. Avram (ajlyon) has suggested that a controlled mechanism within CSL could (well, should) allow term substitutions within cs:layout (which would mean, with the multiple-layouts extension, within any given layout). I was kind of reluctant and non-committal IIRC, but your note drives home that there are real cases that would benefit, and it's not hard to implement (it would only require selective use of portions of code already in there to support multiple cs:layout nodes).
We should certainly discuss this.
1. define and agree on use cases (in the this case for formatted output)
2. establish requirements (in this case, for the data and for the styling; see e.g. this on requirements analysis)
3. offer one or more possible solutions
Too often we skip step 2 and go straight to one particular solution. And it inevitably takes me awhile to back up and figure out/reverse-engineer the requirements to figure out if there are other possible solutions.
Will say, that as a general proposition, I really dislike solutions that require polluting the data fields with additional non-standard data. I think that's what Frank here is proposing, but am not sure.
But if we find that Zotero isn't supporting particular use cases, then I think it a good idea to clearly demonstrate the missing requirement(s). In this case, it might be so simple as "text variables need an optional language attribute". And then Zotero should be upgraded to support this requirement properly (such that it can be reliably exported and imported with at least some formats, for example).
Optional language attributes have been implemented already in the multilingual branch, with support for valid export and import in RDF (other formats lose out, of course, but only because they assume a monolingual universe).
Here, we're just exploiting the existing "Language" field in Zotero to set a default language value for the item. The solution doesn't involve any change to other data fields.
The "cheater syntax" (i.e. "{:language: de}") is a hack that I've embedded in the processor solely for testing purposes. It has value insofar as it allows us to see the effect of a new field assignment in a running style. I do agree that it's an ugly thing, and mention of it should always be accompanied by a caveat to the effect that it is provided only for testing purposes, is not guaranteed to be always available, and should not be used in production data. I should have provided those warnings above; but Simon has kindly supplied a mapping, so there will be no need for anyone to rely on it for this case.
I wouldn't mind organizing the CSL test suite into "proper" and "experimental" CSL tests, but I need an infrastructure to tag or otherwise organize them.