Language field standardization option

Arithmeticus · March 25, 2011

In the language field, users rightfully can add whatever they like. But some users may wish to follow one standard or another, say the ISO 639-n standard. It would be really nice in a future version of Zotero to have a button appear to the right of the field upon entering it. Clicking it would provide a palette of language codes, in order of decreasing frequency of use in a user's Zotero library. Clicking the right entry would automatically insert the correct acronym, based on preference settings.

This would be a very useful feature for individuals or cooperative projects where one may easily forget or not know the correct term to use for a given language.

Perhaps this field needs to be treated as the author/creator field, allowing multiple instances and a determination over sequence.

adamsmith · March 25, 2011

http://gsl-nagoya-u.net/http/pub/zotero-multilingual-overview.html

Arithmeticus · March 25, 2011

That's not an answer.

In Multilingual Zotero (MZ), as I understand it, language tags provide metadata for the metadata; they don't describe what languages are used in the work itself. For that we are supposed to use the language field, correct? Well, MZ's language field is identical to vanilla Z's. In fact, one cannot even apply a language tag to the language field in MZ! My point above still holds.

ajlyon · March 25, 2011

You're right-- the language field currently does not use any single tagging system. This is really part of the larger question that faces Zotero users who are trying to do collaborative curation of metadata-- how do we enforce or encourage controlled vocabularies, etc.?

We could have made multilingual Zotero provide such a tag selection system for the language field, but I argued that we'd lose the flexibility to specify multiple languages, translation source and destination languages, and more. So the field is, for now, a simple text field like the others.

Can I ask what precisely you're trying to express using the field? What languages? Do you have translations? How are you planning to use the data?

The MARC format has some provision for expressing source and target languages for translated works; I'd like to find a way to do this in Zotero too, but it's not straightforward.

I think that the best way forward will involve a Zotero plugin that helps users use controlled vocabularies (authorities, places, languages).

fbennett · March 25, 2011

As ajlyon says, we've opted to keep the language field as a vanilla string in the current MZ implementation. It's a compromise that still leaves a path open for better UI support along the lines you describe.

In fact, the field is not quite as dumb as it looks, even at present. When alternative layouts are supplied to the processor, the leading portion of the field content is parsed out into an RFC 5646 language tag (successor to ISO-639), and used to arbitrate the style applied to individual references. The effect is illustrated in an MZ screencast.

Arithmeticus · March 28, 2011

To ajlyon, I agree that the vanilla text field is fine—great, even. You don't want to dictate how a field should be used. But if someone wishes to observe a controlled vocabulary in a collection a tool to do so would be nice.

As I understand it, the language field, like the fields directly above and below it, indicates what languages are used in the publication itself. My largest Zotero project involves an ancient writer whose literary corpus is scattered across many linguistic traditions—important because the corpus has nearly perished in the original language. It is common for books in my bibliography to have two, three, sometimes four languages at work. Bilingual text-translations are quite common. I would like to use a controlled vocabulary to state what languages are at play in the works cited and with what frequency, so that in post-Zotero processing I can allow users to sort and filter the records according to language. So if someone wants to see what texts use Ethiopic, they find what they need, and there aren't records omitted because someone entered "Ge'ez" or "Ethiopian" instead of "Ethiopic." Or if someone wishes to exclude works where Syriac is only a minor factor (e.g., third or fourth language down), they can do so.

adamsmith · March 28, 2011

Arithmeticus - at some point, though, I have to wonder about mission creep. I see a very real risk to complicate every field in Zotero to an extent where the program becomes less user friendly and more bug prone, the data formats become less compatible with other standards etc.
I see the appeal of having super detailed data models, but I think there's a trade-off involved and I think multiple, hierarchical, standardized languages might just not be worth it.

ajlyon · March 28, 2011

I'm interested in development in this direction, but I firmly believe it can be done by making a Zotero plugin (as Arithmeticus would acknowledge, I think). A plugin with a sound and extensible architecture for controlled vocabularies and all that jazz would make Zotero a juggernaut for metadata curation.

Arithmeticus · March 29, 2011

Agreed—a plugin would be perfect.

ajlyon · March 29, 2011

And such a plugin would make a great grant-funded project for any digital humanities / library science programs out there... I'm sure they'd get plenty of encouragement and support from the folks at CHNM and the broader Zotero community.

Michau · April 3, 2015

Refreshing this topis: is this such plugin available now?
And about integrity and uniformity of data in Zotero (this is quite wider topic), is there a possibility to find all works in given language and replace text in this (and other similar fields).
For example, using Zotero and various site translators I have citations with language field English, eng, en and I want it all uniform en. How to do this?

adamsmith · April 3, 2015

no plugin and no work on one that I'm aware of.

As for your second question: we'd likely try to do some version of this automatically at the point where Zotero actually does anything with the language field beyond turning title case on an off, but if you want to standardize the metadata before that, the best way would be
https://www.zotero.org/support/dev/client_coding/javascript_api#batch_editing
Recommendation would be to convert to two letter ISO language code followed by two letter country code (i.e. en-US, de-DE etc.). That's almost certainly what Zotero will eventually move to, at least in the database layer.