Accented characters in BibTeX export

When exporting accented characters (e.g. å, ä) to BibTeX, I think the current implementation in BibTeX.js may run into trouble if the character set is not UTF-8, causing some citations in LaTeX to display strangely under certain situation.


At present, BibTeX.js translates Unicode to ASCII with an internal table (var mappingTable on line 141, and the relevant section for accented characters starts on line 770). For example, the following line

"\u00E1":"\\'{a}", // LATIN SMALL LETTER A WITH ACUTE
translates the Unicode character á to the BibTeX command \\'{a} when the character set is not UTF-8 during export.


However, the BibTeX command \\'{a} can cause some citations in LaTeX to display strangely under some rare situation. For example, consider doing the following together:

  1. Cite a single-authored paper, whose only author's last name has one of its first three characters accented (e.g. the last name Gál, or Håstad); and
  2. Use a bibliography style that displays the first three characters of the author's last name when citing a single-authored paper (e.g. amsalpha).
Then the citation could be rendered incorrectly by amsalpha, causing some of the first three characters to be dropped, and perhaps placing the accent on a digit. This might be caused by the way that amsalpha counts characters, which interferes with the BibTeX command \\'{a} for placing accents.


The problem can be fixed by enclosing the BibTeX command with an extra pair of braces, i.e. {\\'{a}}. To do this automatically when exporting to BibTeX (using a character set different from UTF-8), the mappingTable needs to update the section for accented characters. A patch is linked below.

http://pastebin.com/0fzZS6yy
  • I don't remember how we came to have \\'{a} in zotero. The mapping table came from refbase, which has the preferred {\\'a}

    It seems like changing to the behavior you describe (perhaps without the redundant brackets) would be ideal.
  • noksagt:

    Thanks for the comment. The updated patch below gets rid of the redundant brackets.

    http://pastebin.com/463P3vs3
  • sorry, this took me a while to get to - the patches aren't up anymore - do you still have them?
    If not, do I understand correctly that that should be done for all accented characters, i.e. l. 781 in the current translator down to the end of the mapping table?
  • adamsmith: The patch is available at http://pastebin.com/ZHEXxUJ0. It does exactly what you suggested in your last comment.

    Also, this patch was created by Siu Man Chan, another zotero user who unfortunately does not have a forum account.
  • thanks, pull request is up, I'd expect this to get merged within the next couple of days:
    https://github.com/zotero/translators/pull/455/files
Sign In or Register to comment.