codes & syntax in the language field

alexander.bocast · April 15, 2016

(a) Given what seems to be the preferred sort of entry in the language field, syntax ll-CC using two-character codes for language and country, what country code(s) should be used with Latin? Does Zotero anticipate using some sort of generic placeholding country code that could be used with Latin, e.g., "la-XX"?

(b) Getting perhaps ahead of parsing the language field, is there a preferred separator between language-country elements. For example, for works containing significant passages in several languages, would "l1-C1 + l2-C2 + l3-C3 + ..." suffice?

(c) Again getting ahead of parsing requirements, please do not forget that the order of language elements may well be significant. For example, the order might indicate the direction of translation in a bilingual dictionary: "fr-FR > nl-NL" would be used to mark the direction of translation as being from French terms to Dutch equivalents, while "nl-NL > fr-FR" would mark Dutch terms translated to French.

adamsmith · April 15, 2016

a) if there is no appropriate country code, just use the language code. Don't use a placeholder. la is valid and would be understood by any relevant parsing.
b) without having thought much about this, comma and space is generally the most standard delimiter and that's what we use elsewhere (e.g. for ISBNs)
c) I understand this is relevant, but honestly I doubt we'll be at a point where we want to have a metadata schema that's able to accommodate that level of precision in Zotero anytime in the next 5-10 years. If someone else handles this already in their metadata schema, we can look at picking that up, but we're certainly not going to come up with anything ourselves.

Rintze · April 15, 2016

I always use http://r12a.github.io/apps/subtags/ to figure out the right locale codes, by the way.

alexander.bocast · April 15, 2016

adamsmith -- good answers. thank you for the guidance.
Rintze -- good resource. thank you as well.

Wikipedia also provides the articles "List of ISO 639-1 codes" for language codes and "ISO 3166-1 alpha-2" for country/region codes.

fbennett · April 15, 2016

Some of this is supported by Juris-M. Juris-M is bundled with the language database behind the service linked by Rintze (the IANA Language Subtag Registry). When rendering citations, language codes are converted to language names if possible, and there is a syntax for recording the original language of a translation (added last year to support theses in our faculty):
en-US<fr-FR
or:
en<fr
(Region extensions are optional, and ignored in the name conversion.)

The CSL-M code for rendering these would be something like:

<choose>
  <if variable="translator language-name language-name-original">
    <group delimiter=" " prefix="[" suffix="]">
      <text variable="language-name" form="short"/>
      <text value="translation by"/>
      <names variable="translator">
        <name/>
      </names>
      <text value="from"/>
      <text variable="language-name-original" form="short"/>
      <text value="original"/>
    </group>
  </if>
</choose>

This would come out as:

[English translation by John Doe from French original]

For works containing texts in multiple languages, the Language field (in our context, at least) indicates the primary language of the target audience (i.e. the language of the preface, etc.). The language of the individual texts would be set on separate Book Section entries.