Title case for non-English titles: call for info

fbennett · June 12, 2011

A recent thread concerning MLA raises an issue affecting every style that uses text-case="title" (38 of them currently, in the repository).

The problem is that some languages have no concept of "title case"; so when the title is written in such a language, the conversion should not take place. Ever. The title should be rendered exactly as it appears in the Zotero database.

For "polyglot" styles that are likely to be used to format references from several languages in a single document, this is a problem.

Before thinking about fixes or enhancements, I have a question:

Does any language other than English have special capitalization rules that apply specifically to titles?

It's a short questionnaire and a big community: I'll look forward to learning the answer. :)

adamsmith · June 12, 2011

No for French and Italian from here
http://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/msg01473.html
No for German and Spanish from personal knowledge. The respective Wikipedia entries in Spanish and English both point to this as a specific English (esp. AE) thing.
http://en.wikipedia.org/wiki/Letter_case#Headings_and_publication_titles
http://es.wikipedia.org/wiki/May%C3%BAscula#En_otros_idiomas

In Portuguese title case exists in theory, but no one (and no relevant style guide) uses it
http://pt.wikipedia.org/wiki/Caixa_alta#T.C3.ADtulos_em_artigos_e_em_cita.C3.A7.C3.A3o_bibliogr.C3.A1fica

fbennett · June 18, 2011

@adamsmith,

Looks like you're the voice of the community!

There's really nothing to be done for this without access to the language field of the target item. Assuming that it will eventually be passed through, I'll set up the processor to disable title-case conversion for any item that (a) has a valid language tag in the language field, which (b) is not from the "en" domain.

For people working with material in mixed languages, it will be possible to pass a language tag through to the CSL processor using the "cheater" syntax: {:language: de} [Edit: in the Extra field]. Let's hope that the mapping can be added in the near future.

(Edit: Simon has added a mapping for the Language field on the Zotero trunk. When that comes through in a stable release, there will be no need to use the "cheater syntax" described above. That syntax is intended only for testing purposes, could be withdrawn at any time, and should not be used in production data.)

Gracile · June 18, 2011

There's really nothing to be done for this without access to the language field of the target item.

That would be very very useful: we would be able to use different locale terms ("ed.", "éd.") in one style. Some styles require this.

adamsmith · June 18, 2011

fbennett - wow - just saw your fix - that's very clever.
@Gracile - well, because you can't test for the content of a variable that will actually still not be possible.
Here's what Frank did:
"In the next revision of citeproc-js (1.0.183), if the item variable "language" begins with a two-character language code that is not "en", the title-case="title" transform will be suppressed, to prevent undesired transforms of non-English content. Mapping this field through will address the issue discussed here: ..."

Gracile · June 19, 2011

@adamsmith: Yes, I know this. I just wanted to point out this need... There has been some posts about that recently on the forum. Maybe this has to be tackled directly in the future.
What Frank has implemented in citeproc-js 1.0.183 is no more than a (great) workaround.

fbennett · June 19, 2011

@Gracile,

The processor will recognize (via an experimental extension that is not valid CSL) entirely separate cs:layout nodes for separate languages. Avram (ajlyon) has suggested that a controlled mechanism within CSL could (well, should) allow term substitutions within cs:layout (which would mean, with the multiple-layouts extension, within any given layout). I was kind of reluctant and non-committal IIRC, but your note drives home that there are real cases that would benefit, and it's not hard to implement (it would only require selective use of portions of code already in there to support multiple cs:layout nodes).

We should certainly discuss this.

Gracile · June 19, 2011

Indeed, I had Avram's recent post in mind.

bdarcus · June 19, 2011

Can I suggest in these kinds of conversations we be a little more deliberative about moving from use case questions (Frank's original question) to implementation ideas? E.g., as process, the common:

1. define and agree on use cases (in the this case for formatted output)
2. establish requirements (in this case, for the data and for the styling; see e.g. this on requirements analysis)
3. offer one or more possible solutions

Too often we skip step 2 and go straight to one particular solution. And it inevitably takes me awhile to back up and figure out/reverse-engineer the requirements to figure out if there are other possible solutions.

Will say, that as a general proposition, I really dislike solutions that require polluting the data fields with additional non-standard data. I think that's what Frank here is proposing, but am not sure.

But if we find that Zotero isn't supporting particular use cases, then I think it a good idea to clearly demonstrate the missing requirement(s). In this case, it might be so simple as "text variables need an optional language attribute". And then Zotero should be upgraded to support this requirement properly (such that it can be reliably exported and imported with at least some formats, for example).

fbennett · June 19, 2011

That's fair; and the recent back-and-forth over superscripting in locale terms is a good example of the kind of confusion you describe. In this case, though, the solution is fairly obvious, so I went ahead with an opportunistic solution.

Optional language attributes have been implemented already in the multilingual branch, with support for valid export and import in RDF (other formats lose out, of course, but only because they assume a monolingual universe).

Here, we're just exploiting the existing "Language" field in Zotero to set a default language value for the item. The solution doesn't involve any change to other data fields.

The "cheater syntax" (i.e. "{:language: de}") is a hack that I've embedded in the processor solely for testing purposes. It has value insofar as it allows us to see the effect of a new field assignment in a running style. I do agree that it's an ugly thing, and mention of it should always be accompanied by a caveat to the effect that it is provided only for testing purposes, is not guaranteed to be always available, and should not be used in production data. I should have provided those warnings above; but Simon has kindly supplied a mapping, so there will be no need for anyone to rely on it for this case.

dstillman · June 20, 2011

Will say, that as a general proposition, I really dislike solutions that require polluting the data fields with additional non-standard data.

Second this. I understand that these are generally implemented as testing measures, but their very presence in production versions of Zotero—which we can't really avoid, unless we comb through citeproc-js code to disable them—is problematic. I'd even suggest that all of these "experimental" features be controlled by a hard-coded constant in citeproc-js that either defaulted to off or that we could easily turn off in production builds of Zotero.

ajlyon · June 20, 2011

I think a single constant would be a bad thing-- the author name hack is less problematic. I'm little concerned by this trick for entering additional fields as well, but I'd hate to lose the system for entering dropping and non-dropping particles in names.

fbennett · June 20, 2011

Turning off the hack is no problem. The same goes for the rest of 'these "experimental" features', but I'd have to be told what they are.

dstillman · June 20, 2011

OK, I was thinking this one was a separate hack, but I see now it's the same one as above.

Rintze · June 20, 2011

"The same goes for the rest of 'these "experimental" features', but I'd have to be told what they are."

I wouldn't mind organizing the CSL test suite into "proper" and "experimental" CSL tests, but I need an infrastructure to tag or otherwise organize them.