Title case for non-English titles: call for info

A recent thread concerning MLA raises an issue affecting every style that uses text-case="title" (38 of them currently, in the repository).

The problem is that some languages have no concept of "title case"; so when the title is written in such a language, the conversion should not take place. Ever. The title should be rendered exactly as it appears in the Zotero database.

For "polyglot" styles that are likely to be used to format references from several languages in a single document, this is a problem.

Before thinking about fixes or enhancements, I have a question:
Does any language other than English have special capitalization rules that apply specifically to titles?
It's a short questionnaire and a big community: I'll look forward to learning the answer. :)
  • No for French and Italian from here
    http://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/msg01473.html
    No for German and Spanish from personal knowledge. The respective Wikipedia entries in Spanish and English both point to this as a specific English (esp. AE) thing.
    http://en.wikipedia.org/wiki/Letter_case#Headings_and_publication_titles
    http://es.wikipedia.org/wiki/May%C3%BAscula#En_otros_idiomas

    In Portuguese title case exists in theory, but no one (and no relevant style guide) uses it
    http://pt.wikipedia.org/wiki/Caixa_alta#T.C3.ADtulos_em_artigos_e_em_cita.C3.A7.C3.A3o_bibliogr.C3.A1fica
  • edited June 19, 2011
    @adamsmith,

    Looks like you're the voice of the community!

    There's really nothing to be done for this without access to the language field of the target item. Assuming that it will eventually be passed through, I'll set up the processor to disable title-case conversion for any item that (a) has a valid language tag in the language field, which (b) is not from the "en" domain.

    For people working with material in mixed languages, it will be possible to pass a language tag through to the CSL processor using the "cheater" syntax: {:language: de} [Edit: in the Extra field]. Let's hope that the mapping can be added in the near future.

    (Edit: Simon has added a mapping for the Language field on the Zotero trunk. When that comes through in a stable release, there will be no need to use the "cheater syntax" described above. That syntax is intended only for testing purposes, could be withdrawn at any time, and should not be used in production data.)
  • There's really nothing to be done for this without access to the language field of the target item.
    That would be very very useful: we would be able to use different locale terms ("ed.", "éd.") in one style. Some styles require this.
  • fbennett - wow - just saw your fix - that's very clever.
    @Gracile - well, because you can't test for the content of a variable that will actually still not be possible.
    Here's what Frank did:
    "In the next revision of citeproc-js (1.0.183), if the item variable "language" begins with a two-character language code that is not "en", the title-case="title" transform will be suppressed, to prevent undesired transforms of non-English content. Mapping this field through will address the issue discussed here: ..."
  • @adamsmith: Yes, I know this. I just wanted to point out this need... There has been some posts about that recently on the forum. Maybe this has to be tackled directly in the future.
    What Frank has implemented in citeproc-js 1.0.183 is no more than a (great) workaround.
  • @Gracile,

    The processor will recognize (via an experimental extension that is not valid CSL) entirely separate cs:layout nodes for separate languages. Avram (ajlyon) has suggested that a controlled mechanism within CSL could (well, should) allow term substitutions within cs:layout (which would mean, with the multiple-layouts extension, within any given layout). I was kind of reluctant and non-committal IIRC, but your note drives home that there are real cases that would benefit, and it's not hard to implement (it would only require selective use of portions of code already in there to support multiple cs:layout nodes).

    We should certainly discuss this.
  • Indeed, I had Avram's recent post in mind.
  • edited June 19, 2011
    Can I suggest in these kinds of conversations we be a little more deliberative about moving from use case questions (Frank's original question) to implementation ideas? E.g., as process, the common:

    1. define and agree on use cases (in the this case for formatted output)
    2. establish requirements (in this case, for the data and for the styling; see e.g. this on requirements analysis)
    3. offer one or more possible solutions

    Too often we skip step 2 and go straight to one particular solution. And it inevitably takes me awhile to back up and figure out/reverse-engineer the requirements to figure out if there are other possible solutions.

    Will say, that as a general proposition, I really dislike solutions that require polluting the data fields with additional non-standard data. I think that's what Frank here is proposing, but am not sure.

    But if we find that Zotero isn't supporting particular use cases, then I think it a good idea to clearly demonstrate the missing requirement(s). In this case, it might be so simple as "text variables need an optional language attribute". And then Zotero should be upgraded to support this requirement properly (such that it can be reliably exported and imported with at least some formats, for example).
  • That's fair; and the recent back-and-forth over superscripting in locale terms is a good example of the kind of confusion you describe. In this case, though, the solution is fairly obvious, so I went ahead with an opportunistic solution.

    Optional language attributes have been implemented already in the multilingual branch, with support for valid export and import in RDF (other formats lose out, of course, but only because they assume a monolingual universe).

    Here, we're just exploiting the existing "Language" field in Zotero to set a default language value for the item. The solution doesn't involve any change to other data fields.

    The "cheater syntax" (i.e. "{:language: de}") is a hack that I've embedded in the processor solely for testing purposes. It has value insofar as it allows us to see the effect of a new field assignment in a running style. I do agree that it's an ugly thing, and mention of it should always be accompanied by a caveat to the effect that it is provided only for testing purposes, is not guaranteed to be always available, and should not be used in production data. I should have provided those warnings above; but Simon has kindly supplied a mapping, so there will be no need for anyone to rely on it for this case.
  • Will say, that as a general proposition, I really dislike solutions that require polluting the data fields with additional non-standard data.
    Second this. I understand that these are generally implemented as testing measures, but their very presence in production versions of Zotero—which we can't really avoid, unless we comb through citeproc-js code to disable them—is problematic. I'd even suggest that all of these "experimental" features be controlled by a hard-coded constant in citeproc-js that either defaulted to off or that we could easily turn off in production builds of Zotero.
  • I think a single constant would be a bad thing-- the author name hack is less problematic. I'm little concerned by this trick for entering additional fields as well, but I'd hate to lose the system for entering dropping and non-dropping particles in names.
  • Turning off the hack is no problem. The same goes for the rest of 'these "experimental" features', but I'd have to be told what they are.
  • OK, I was thinking this one was a separate hack, but I see now it's the same one as above.
  • "The same goes for the rest of 'these "experimental" features', but I'd have to be told what they are."

    I wouldn't mind organizing the CSL test suite into "proper" and "experimental" CSL tests, but I need an infrastructure to tag or otherwise organize them.
Sign In or Register to comment.