biblatex import/export: CSL "language" = biblatex "langid"

The CSL "language" variable should be mapped from and to the biblatex "langid" field upon import and export, *not* from/to the biblatex "language" field.

The biblatex "langid" field (which used to be called "hyphenation" until biblatex v2.8, released 2013-10-21) specifies the (main) language of the metadata, and is used to switch hyphenation patterns and capitalization routines. It thus matches the CSL "language" variable perfectly.

In biblatex, the "langid" identifier must be a language name known to the babel/polyglossia packages. A fairly complete list, including mappings between biblatex and CSL can be found in the pandoc-citeproc sources, see http://hackage.haskell.org/package/pandoc-citeproc-0.1.2.1/docs/src/Text-CSL-Input-Bibtex.html.

The biblatex "language" field, by contrast, describes the language(s) of the content, e.g., "greek and latin and english", but has no counterpart in CSL.

The biblatex import filter should thus map biblatex "langid" (ideally also considering "langidopts" where details such as "variant=british" might be specified) and "hyphenation" (for backwards compatibility) to CSL "language", translating babel language names to CSL language identifiers.

The export filter should map CSL "language" to biblatex "langid". Mapping anything to the "langidopts" field is not essential, AFAICS, since all available languages can be specified without using langidopts (which has been introduced to facilitate the use of the polyglossia package).
  • The export translator for biblatex already does this, code here:
    https://github.com/zotero/translators/blob/master/BibLaTeX.js#L529

    We can look at this on import, we can use langid as a fallback for language, but I don't think we'll prefer it (there is a single bibtex/biblatex import translator, so when in doubt we follow bibtex and not biblatex).
  • I see. Apparently my translator did not auto-update *and* I was bitten by the fact that "lastUpdated": "2013-07-01 14:13" in BibLaTeX.js was not updated between versions.

    What I now get upon export is hyphenation = {en}. biblatex, however, will not understand this; it should be hyphenation = {english}.
  • edited December 19, 2013
    Looks like we didn't update the timestamp when we pushed this, I'll do that asap.

    What I now get upon export is hyphenation = {en}. biblatex, however, will not understand this; it should be hyphenation = {english}.
    Do you actually mean hyphenation or do you mean langid? I don't see how how the translator would ever produce the former.
    Also, I think the way this is written should produce "english" and not "en" for langid - what's in the Zotero field?
  • edited December 19, 2013
    timestamp is updated. I get langid={english} for both "english" and "en" in Zotero's language field, so that looks like it's working as intended.
  • Another update, can't really tell why, but now it works as expected. Thank you for your assistance.

    The only thing I'm still not too happy about is the biblatex "language" field.

    Both standard biblatex and biblatex-chicago print the content of this field by default, so the way things are now, you get, e.g.,

    Author, Ann. 2012. Title. en-GB. Place: Publisher.

    Probably not what anyone would ever want.

    I would recommend not to write the biblatex "language" field at all - but if you feel you must, please do not use the content of the Zotero "language" field as is but map it to the appropriate biblatex "language name" (see biblatex manual, "4.9.2.18 Language Names").
  • I'll talk to the main dev of that translator - seems plausible to me to not render language when we have langid.
  • Hi.
    That it writes out both language and langid is completely a mistake. I just forgot to remove the old direct mapping to language when I added the smarter handling of langid. Easily fixed.

    The other thing that it does is writing out language literally to the language field if no matching language is found in the "smart" selection of languages (which writes to langid). Maybe this is wrong if the meaning of the CSL-language field is a language only used for localization and not written out in the bibliography. Biblatex obviously distinguishes between these two uses and provides two different fields, which doesn't seem to be the case for CSL and Zotero.

    Maybe then we should never write to the biblatex language field. Do you think I should do that? Would people expect anything else?

    I based the language mapping on the list of supported languages in biblatex according to the manual, but now I realize that "supported" probably only refers to availability of localization strings and not to hyphenation so this list could be extended to all languages supported by babel or polyglossia and at least give correct hyphenation in the bibliography. I'll look into this.

    I haven't added support for polyglossia yet though, only babel's language variants (see languageMap in the file). But I guess that could be done quite easily. I'll just have to look up which languages and variants are supported in polyglossia.

    Thanks for the input.
  • Maybe then we should never write to the biblatex language field. Do you think I should do that? Would people expect anything else?
    I tend to think that content from the Zotero/CSL "language" field should never be mapped to the biblatex "language" field. In particular, it seems more confusing than helpful if some content is written to biblatex "langid" and some to biblatex "language." And - leaving aside the slightly confusing likeness of field names - I do not think people will actually expect a field that is never printed (CSL "language") to be mapped to a field that by default is (biblatex "language").
  • A pull request for these fixes:
    https://github.com/zotero/translators/pull/664

    I'll add support for more languages and polyglossia later as
    fixing this is of course more urgent The people possibly affected by erroneously printed languages in bibliographies are more than the people depending on an integration of Zotero/CSL language and biblatex langid for more languages than is already supported now (probably no one).
  • A translator fix is now up. Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences.

    Thanks to anjo7539
  • edited December 24, 2013
    Hi all,

    it would be really nice if the export worked with any string entered into the Zotero language field, but show a warning that there are entries which don't match any known supported BibLaTeX language (possibly with a list of not matching entries) so people can change those, for example by using a text editor's "find and replace" function.

    I got a bibliography with hundreds of entries that I maintain in Zotero. I use language entries like "German" or "English" that make it perfectly clear for anyone searching within my Zotero bibliography which language the entry is in, but of course BibLaTeX doesn't recognize such language values.
    The current BibLaTeX export seems to ignore said entries, so I'd have to change them from "German" to "ngerman", "English" to "USenglish" or "american" and so on to make the BibLaTeX export work.
    First of all, that's an unnecessary differentiation within Zotero bibliographies and could lead to confusion as people might not be familiar with BibLaTeX languages and second I'd have to do said replacements manually as Zotero is lacking an option to find and replace within specific fields. In recent export versions I was able to just find and replace the exported entries with a text editor, but now I can't as there are no "langid" fields being exported at all.

    *edit* Typos and clearing it up.

    *edit2*: Neither "american" nor "USenglish" nor "ngerman" nor "german" get exported right now, although some of them are listed in BibLaTeX.js' "var languageMap". This utterly breaks my thesis' references which make use of language switching. "english" works though.

    *edit3*: Now I get the logic behind the mapping (e.g. "en:US" or "en-US" for "american") and I helped myself using the find-and-replace JavaScript code found here: https://forums.zotero.org/discussion/7707/. It still would be nice if the BibLaTeX export had an option to just export to "langid" whatever the Zotero language field contains.
  • I experience a problem with biblatex export and my guess is that it is the result of recent changes described above.
    I noticed a change because now the single Zotero field "Issue" is translated in two biblatex fields ("Issue" and "Number") with the same value. This results in the incorrect output like this:

    Brader, Ted A., Joshua A. Tucker, and Dominik Duell. 2013. "Which Parties Can Lead Opinion? Experimental Evidence on Partisan Cue Taking in Multiparty Democracies." Comparative Political Studies 46, no. 11 (11): 1485-1517.

    However there are other changes. Here is the result of standard biblatex export 2 weeks ago:

    @article{brader_which_2013,
    title = {Which Parties Can Lead Opinion? Experimental Evidence on Partisan Cue Taking in Multiparty Democracies},
    volume = {46},
    issn = {0010-4140, 1552-3829},
    url = {http://cps.sagepub.com/content/46/11/1485},
    doi = {10.1177/0010414012453452},
    shorttitle = {Which Parties Can Lead Opinion?},
    language = {en},
    issue = {11},
    pages = {1485-1517},
    journaltitle = {Comparative Political Studies},
    shortjournal = {Comparative Political Studies},
    author = {Brader, Ted A. and Tucker, Joshua A. and Duell, Dominik},
    urldate = {2013-12-01},
    date = {2013-11-01},
    keywords = {Great Britain, Hungary, partisanship, party cues, Poland, political parties, public opinion, survey experiments}
    }

    Here is what I get now:

    @article{brader_which_2013,
    title = {Which Parties Can Lead Opinion? Experimental Evidence on Partisan Cue Taking in Multiparty Democracies},
    volume = {46},
    issn = {0010-4140, 1552-3829},
    url = {http://cps.sagepub.com/content/46/11/1485},
    doi = {10.1177/0010414012453452},
    shorttitle = {Which Parties Can Lead Opinion?},
    abstract = {Political parties not only aggregate the policy HERE GOES COMPLETE ABSTRACT.},
    issue = {11},
    pages = {1485-1517},
    number = {11},
    journaltitle = {Comparative Political Studies},
    shortjournal = {Comparative Political Studies},
    author = {Brader, Ted A. and Tucker, Joshua A. and Duell, Dominik},
    urldate = {2013-12-01},
    date = {2013-11-01},
    langid = {english},
    keywords = {Great Britain, Hungary, partisanship, party cues, Poland, political parties, public opinion, survey experiments}
    }

    Questions:
    1. How to fix the problem with double issue/number?
    2. Is there a place where such changes are documented? It makes using Zotero unreliable when - without a warning - the same operation could not be performed after a short period of time.
  • edited January 2, 2014
    1. that was added in a different commit here:
    https://github.com/zotero/translators/commit/f49a0f18a4edab6df9f18a1c7c60f6be0b40ce7f#diff-25752e7a73f4fcccddce475afb61db43R355

    doesn't look like it's working as intended - if anjo could take a look that'd be great.

    2. Translator changes are not individually documented, but are all available via the history on https://github.com/zotero/translators

    In general we change export translators rarely because of stability concerns, but since we've only had BibLaTeX for two months and are still figuring out how to best implement some details, I'm allowing more changes. E.g. the BibTeX translator hasn't seen such dramatic changes for at least a year.
  • edited January 2, 2014
    1. Thank you for prompt and helpful response, as usual!
    2. I see the problem with new translators. By the way, as far as I understand, with these changes in language fields you have just fixed the problem that was already fixed on the biblatex end (http://tex.stackexchange.com/questions/147749/excessive-fields-in-biblatex-chicago-author-date-style). I understand that there are many implementations of biblatex, but some sort of coordination with at least major ones could save resources on both ends. It maybe unfeasible though.
  • 2. We just follow the official biblatex manual - and thankfully there's only one - as closely as possible. The language/langid issue is entirely separate from the thread you link to - it helps users get correct output for items in different languages and had been requested a number of times.
    3. I definitely can't do it, but it may also just not be possible given the way the forum software works.
  • 2. I am sorry, I maybe confused you. I meant not the "urldate" field issue, but the "language" field issue. It was solved though using the same \AtEveryBibitem{\clearlist\ function.
    And ideally I would prefer biblatex to use only the fields I need rather than limit the information Zotero is storing. However, as far as I can understand, the partial solution here is exactly in the fine-tuning of the export translators.
  • edited January 4, 2014
    @sanovich: I've fixed this now and here is the pull request:
    https://github.com/zotero/translators/pull/665.
    It'll update whenever @adamsmith pulls it.

    Note that the old behaviour (where you got only "issue" was actually wrong) and what you will get now is only "number" (for numeric "issue numbers" as in your example)

    Quoting the biblatex (2.8a) manual:

    issue: field (literal)
    The issue of a journal. This field is intended for journals whose individual issues are identified by a designation such as ‘Spring’ or ‘Summer’ rather than the month or a number. Since the placement of issue is similar to month and number , this field may also be useful with double issues and other special cases. See also month, number, and § 2.
    number: field (literal)
    The number of a journal or the volume/number of a book in a series . See also issue as well as §§ 2.3.7 and 2.3.9. With @patent entries, this is the number or record token of a patent or patent request.
  • @s0larix: The reason for expecting ISO language codes (e.g. en-US) and mapping these to babel (and soon hopefully polyglossia) languages is that this is what is used in CSL. If there are any CSL applications that makes use of it for switching language (Zotero word-processor plugins?), having the correct ISO language code would mean that it works both in that case and with biblatex.

    The language fields in peoples real life Zotero databases don't neccesarily look that nice though. I've noticed that you could end up with many different language codes using the web translators.
  • Thetranslator fix is now up. Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences.

    Thanks!
  • @anjo7539: That makes sense. Luckily it wasn't too hard to update my database as explained above.
  • edited January 8, 2014
    @anjo7539 @adamsmith thank you very much for fixing this.

    I should note that with this new translator, which is following "biblatex" guidelines and provides numbers instead of issues, I have to add "biblatex-chicago" command "numbermonth=false" to the preamble of my latex document to get it right way, e.g. "American Economic Review 98 (3): 808-842".
    Absent this command I was getting "American Economic Review 98, no. 3 (May 1): 808-842".
Sign In or Register to comment.