Non valid language codes in language field

So, my understanding is that the language field is not validated, or that translators don't have to return valid iso language types.

Recently we've encountered translators that return values that LOOK like valid language codes but are not (example: http://www.pbs.org/newshour/making-sense/care-peoples-kids/ has language field value en_US)

Should just things be reported/fixed as bugs or is the general policy that since the field isn't validated, valid language codes don't have to be used?
  • Correct, there's currently no validation or requirement for language field. Where possible, we try to supply valid iso codes, but more often than not the data is simply scraped from the page in whatever format is presented. In the future, the field will likely get parsed into a correct iso code, but that will probably not going to be guaranteed (i.e. API consumers can expect a valid code, but shouldn't break if the value is not a valid code)
  • Okay, thanks! Do you have any idea what standard you might use?
  • yeah, we're likely going to use the ISO two letter language code followed by two letter country code the way mozilla abbreviates locales, i.e. en-US.
    (That's what CSL already uses/understands for locales and citeproc-js actually does understand it in the language field). It's possible Dan will prefer a separation between display and database (the way it's done e.g. with date added, which is stored as ISO but displayed as text), but given the complexity of language that seems tricky.

This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.

Sign In or Register to comment.