Language field standardization option
In the language field, users rightfully can add whatever they like. But some users may wish to follow one standard or another, say the ISO 639-n standard. It would be really nice in a future version of Zotero to have a button appear to the right of the field upon entering it. Clicking it would provide a palette of language codes, in order of decreasing frequency of use in a user's Zotero library. Clicking the right entry would automatically insert the correct acronym, based on preference settings.
This would be a very useful feature for individuals or cooperative projects where one may easily forget or not know the correct term to use for a given language.
Perhaps this field needs to be treated as the author/creator field, allowing multiple instances and a determination over sequence.
This would be a very useful feature for individuals or cooperative projects where one may easily forget or not know the correct term to use for a given language.
Perhaps this field needs to be treated as the author/creator field, allowing multiple instances and a determination over sequence.
In Multilingual Zotero (MZ), as I understand it, language tags provide metadata for the metadata; they don't describe what languages are used in the work itself. For that we are supposed to use the language field, correct? Well, MZ's language field is identical to vanilla Z's. In fact, one cannot even apply a language tag to the language field in MZ! My point above still holds.
We could have made multilingual Zotero provide such a tag selection system for the language field, but I argued that we'd lose the flexibility to specify multiple languages, translation source and destination languages, and more. So the field is, for now, a simple text field like the others.
Can I ask what precisely you're trying to express using the field? What languages? Do you have translations? How are you planning to use the data?
The MARC format has some provision for expressing source and target languages for translated works; I'd like to find a way to do this in Zotero too, but it's not straightforward.
I think that the best way forward will involve a Zotero plugin that helps users use controlled vocabularies (authorities, places, languages).
In fact, the field is not quite as dumb as it looks, even at present. When alternative layouts are supplied to the processor, the leading portion of the field content is parsed out into an RFC 5646 language tag (successor to ISO-639), and used to arbitrate the style applied to individual references. The effect is illustrated in an MZ screencast.
As I understand it, the language field, like the fields directly above and below it, indicates what languages are used in the publication itself. My largest Zotero project involves an ancient writer whose literary corpus is scattered across many linguistic traditions—important because the corpus has nearly perished in the original language. It is common for books in my bibliography to have two, three, sometimes four languages at work. Bilingual text-translations are quite common. I would like to use a controlled vocabulary to state what languages are at play in the works cited and with what frequency, so that in post-Zotero processing I can allow users to sort and filter the records according to language. So if someone wants to see what texts use Ethiopic, they find what they need, and there aren't records omitted because someone entered "Ge'ez" or "Ethiopian" instead of "Ethiopic." Or if someone wishes to exclude works where Syriac is only a minor factor (e.g., third or fourth language down), they can do so.
I see the appeal of having super detailed data models, but I think there's a trade-off involved and I think multiple, hierarchical, standardized languages might just not be worth it.
And about integrity and uniformity of data in Zotero (this is quite wider topic), is there a possibility to find all works in given language and replace text in this (and other similar fields).
For example, using Zotero and various site translators I have citations with language field English, eng, en and I want it all uniform en. How to do this?
As for your second question: we'd likely try to do some version of this automatically at the point where Zotero actually does anything with the language field beyond turning title case on an off, but if you want to standardize the metadata before that, the best way would be
https://www.zotero.org/support/dev/client_coding/javascript_api#batch_editing
Recommendation would be to convert to two letter ISO language code followed by two letter country code (i.e. en-US, de-DE etc.). That's almost certainly what Zotero will eventually move to, at least in the database layer.