[mlz] personal communications genre localization

When trying to clean up the code about personal_communications items in twlaw, I found I couldn't localize the genre names "email" and "instant message". The only genre term available for localization is "letter". Since they are all of the same type, I can't assign the right title to such items without localizing the terms.
  • edited November 3, 2013
    Ah, yes. In MLZ, the Instant Message and Email types come through as personal_communication, but with the "genre" field pre-populated. Those labels were left out in the list of terms because they aren't variable names, but they should be in there since they're inserted programatically. I'll add both to the CSL-m locales shortly.

    (Note: this post is a response to the one below ...)
  • edited November 3, 2013
    D'oh! Hold that thought ... the usual method of localisation won't work, will it, since the "email" and "instant message" labels are in field content. We can't test for field content. As you say, the only way to discriminate them from personal_communication in the current setup is to cast each as a separate CSL-m item type.

    The downside with adding specific item types for these is that it would further aggravate the discrepancy in output when official Zotero repository styles are used with MLZ. Currently, the discrepancies are mostly limited to legal citations, so this could be an unwelcome surprise for some MLZ users.

    There are some alternatives:
    1. We could set the "jurisdiction" segment with the item language code, when calling the abbreviation list on non-legal item types. That would allow you to set "email" and "instant message" with the appropriate Chinese terms just once, via the Abbreviation Filter, to convert the strings supplied by MLZ to a localised form. That shouldn't have any immediate side effects, but it would increase the number of abbreviation entries in the UI, which might not be a good thing.

    2. We could extend the processor in MLZ to substitute only those particular content strings on the "genre" field with a proper localised term (supplied in the locale files, and possibly modified by the style). This means hard-wiring the test for content inside the processor, which feels like a kludge, but it would have the smallest impact: the behaviour would be transparent to the user.
  • A third possibility would be to implement a special test attribute aimed at this particular issue, so that you could say things like:
    <choose>
    <if type="personal_communication"
    genre="instant-message" match="all">
    <text term="instant-message"/>
    </if>
    <else-if type="personal_communication
    genre="email" match="all">
    <text term="email"/>
    </else-if>
    <else-if>
    <conditions match="all">
    <condition type="personal_communication"/>
    <condition variable="genre" match="none"/>
    </conditions>
    <text term="letter"/>
    </else-if>
    <else>
    <text variable="genre"/>
    </else>
    </choose>

    This may be the best option. It would be transparent to the user, it gives the style designer opt-in control over the output, and it would not affect the behaviour of the Zotero repository styles when run in MLZ.
  • The third option sounds good, and we might need it down the road. For now, however, just being able to localize the two terms are good enough for me, for I can use a generic format for all three sub-types of personal_communication, like this:

    <group delimiter="">
    <names variable="author">
    ...
    </names>
    <names variable="recipient" prefix="致">
    ...
    </names>
    <text variable="genre"/>
    </group>
  • The problem is that MLZ supplies the English field-content "email" and "instant message" on the respective types when the field is empty, so you would always get the English string with that code.

    I've implemented the test, updated the locales, checked that it works, and pushed a new version of MLZ. I've updated the validator as well, so you can still validate your CSL-m style online:

    http://fbennett.github.io/csl-validator.js/

    It was a small change in the processor, and it won't have side-effects on official CSL styles, since the new attribute would be caught by validation.
  • That's fast. Thank you.
  • There is a similar issue with the CSL-m broadcast type, which may have "television broadcast", "radio broadcast" or "podcast" in the genre field, if it is not explicitly specified in the record.

    I've added terms for those (podcast, radio-broadcast, television-broadcast), updated the CSL-m schema, updated the online validator, and pushed a fresh version of MLZ with the extended locale set.
  • Would it make sense to implement the very same solution for other variables and in other citeprocs as well?

    I just came across a similar problem when trying to figure out how to localize the content of the status variable.

    Here, too, being able to say,

    <choose>
    <if status="forthcoming">
    <text term="forthcoming"/>
    </if>
    <else-if status="in press">
    <text term="in press"/>
    </else-if>
    <else>
    <text variable="genre"/>
    </else>
    </choose>


    would allow proper localization of "status".

    (status is one of the CSL variables not available in Zotero (yet), but can be accessed in pandoc. I am sure the pandoc author would be open to implementing this, in particular if we could agree on some basic set of rules.)

    This could also be useful for localizing the genre terms for different kinds of theses, or reports.
  • There is a similar issue with the CSL-m broadcast type ...
    Thanks. This reminds me to look into that part of code. Never spent much thought on it for I've never seen one cited.
  • Would it make sense to implement the very same solution for other variables and in other citeprocs as well?
    we can/should certainly discuss how to do this, but making specific variables testable for content requires a CSL spec change - that's not something that a citeproc should just implement.
    (The reason fbennett can do that is that he runs his own fork of the csl specs (csl-m)).
    Discussion on that should happen on the xbiblio mailing list.
  • I just came across a similar problem when trying to figure out how to localize the content of the status variable.
    Wouldn't this something you can achieve by entering the localized term into the "status" field? Unless you want to do something different (other than just using a different term) according to an item's status.

    Email and instant message are Zotero item types that are merged into the personal_communication type in csl. Though I can do something similar by using the "letter" type and enter a localized term for email or instant message into the "type" field (which would be mapped to "genre" in csl), I can't prevent users from using the "email" or "instant message" type. That's why it needs a solution.

    That said, I'm not against your idea. Frank is the programmer, so he surely can implement it if he sees fit. I'm just saying what you want with the example you provide can be done already.
  • Wouldn’t this something you can achieve by entering the localized term into the “status” field?
    Sure, but I’d have to modify the status variable in my database whenever I want to use a different output language.

    My point is that manual intervention should not be necessary: When a status variable contains “forthcoming”, the processor should output the appropriate localized string, e.g., “à paraître”.

    BTW, the terms for “forthcoming” and “in press” are in the locale files already, what is needed is just a way of accessing them, based on variable content.
  • edited November 3, 2013
    @nickbart,

    I'm with adamsmith that a test of content should not be introduced into official CSL without a thorough discussion on the xbiblio-devel list. The specific test introduced here is a hack, prompted by factors that don't affect official CSL development. More on those factors below: but first, here are some thoughts on the status field (most of this will already be familiar, but the notes might be useful to others who come across this thread).

    Ideally, there should be no test of field content unless the content itself is strictly controlled by the specification. In official CSL we can test for item type, but the value must be one of the valid CSL types, defined in the specification. Similarly, CSL-m offers tests of the value in the jurisdiction field, but the permissible jurisdiction values are also restricted to a (much larger) controlled list.

    Where the specification limits field values by a list, user applications enforce it in the GUI to insure consistency. That's critical to the user experience, and every application does it (Zotero, Mendeley, Papers, MLZ, Qiqqa, Paperpile, colwiz, Logos, etc).

    The same applies to the status field. There are many descriptive values that a user might want to enter as "status": In Press, Forthcoming, Preprint, Draft, Under Review, Final, Unpublished, Published, Submitted, Accepted, Anonymized, and the list goes on. With a fuzzy set of possible values, testing of free text content would have unpredictable results. To avoid chronic frustration, the list of permitted values needs to be fixed and documented. At the same time, applications will need to commit to the list, and implement it in their respective user interfaces. Lining all of that up will be cumbersome, but there really isn't any way around it: testing of field content only makes sense when the field of possible values is known, and enforced by applications.

    The CSL-m test of the "genre" value introduced above breaks that rule, but it's the least-bad of several alternatives. Here's where the need comes from.

    MLZ and CSL-m aim to support legal and multilingual referencing. For the US, this means implementing the rules of "The Bluebook: A Uniform System of Citation", maintained by students at four of the nation's top law schools for the past eighty years or so. As the product of incremental contributions by a transient staff of amateurs over a long period of time, the style rules set forth in the Bluebook are rather chaotic in places.

    Implementing the Bluebook rules is a challenge because of their arbitrariness and complexity: but because close adherence to the style is treated as a big deal in law school, an implementation needs to come reasonably close to the mark to be useful.

    The rules for multilingual content in the current edition of the Bluebook (videos, movies, broadcasts, and various types of online content) require distinct treatment of several types that are represented separately in Zotero, but which map to a more limited set of types in CSL (broadcast and personal_communication). I chose not to add a flurry of new item types to the CSL-m schema to cope with these cases, because of the impact it would have on official CSL styles when run in MLZ -- items of these types would test "false" for all types handled by a standard CSL style, and fall back to the style's default format (which would almost always be wrong).

    The workaround that I adopted was to inject a fixed value into the genre field for several types (Email, Instant Message, Radio Broadcast, Television Broadcast, and Podcast). This gave rise to the problem encountered by mlwang: that these strings, invisible in the user interface but with known values, could not be localised. The test introduced here makes that possible, by explicit code that will reveal what is going on to anyone who edits the style.

    I will be the first to admit that this needle-threading exercise is not ideal: but momentum is important to the effort, and this controlled indiscretion that allows MLZ/CSL-m style authors to go forward with the task of implementing legal and multilingual styles in an orderly way. The number of CSL-m styles is still quite small, as well, which means that if we change our minds about this approach, styles can be adapted without crushing effort. For the present this particular fix works for us, but I certainly wouldn't propose (or support) a similar workaround in official CSL, neither for the genre nor for the status field.

    So to return to the starting point of this long note, and as adamsmith says, the status field issue is a topic for the xbiblio-devel list.
  • @fbennett,

    The various *cast genre tests don't seem to work. I tried both type="song" and type="broadcast" with podcast (because in your book podcast is categorized as "song"). Neither worked. The validator keeps telling me:
    stdin:3038:10: error: attribute ^genre with invalid value "podcast"
    required:
    value ^token "email"
    value ^token "instant-message"
    Same with "radio-broadcast" & "television-broadcast".
  • Ouch, sorry about that. The validator should recognize those now.
  • @fbennett,

    thank you for the detailed explanation, much appreciated. I will bring this up on xbiblio.
  • @fbennett,

    It seems localization of *cast terms doesn't work in bibliography sorting macros. I had a macro (X) that adds the relevant term to the start of a title, with the following code:
    ...
    <else-if match="all" type="broadcast" genre="television-broadcast">
    <text term="television-broadcast"/>
    </else-if>
    ...

    where the term television-broadcast had been defined earlier as "電視節目". Same for other *cast terms. The macro was the only one in twlaw using the <text term="television-broadcast"/> instruction.

    X was called by two different macros, first for citation and bibliography generation (Y) and the other for bib sorting (Z).

    When called from Y, X properly put "電視節目" at the start of a title. When called from Z, however, X would output (behind the scene, of course) "television broadcast" for the term, disrupting the order of bib items as a result.

    Since I couldn't see the output of bib sort key macros, it took me a while to figure this out. I've since changed the code to use the "text value" syntax instead of the "text term" syntax, so it's ok now. But that the same macro would output different results in different contexts is surprising.

    I left the old code intact (commented out) in the newest version (pull request pending) of twlaw (line 3078-3086) so you may take a look. Remember however that I add a space to the start of Chinese items for sorting purpose, so when the form of "text term" is used, the wrongly sorted item would not go with non-Chinese items, but rather Chinese items that start with a non-Chinese word (translated works, e.g.).
  • Good detective work. I'll take a look.
  • Confirmed, I can reproduce this, and the behind-the-scenes view is exactly as you describe. The rendering locale for the item is not being honoured when generating sort keys. It won't be immediate, but I should be able to fix this soon.
  • Thanks. No rush. Twlaw is working properly as is, and those are rarely cited types anyway.
Sign In or Register to comment.