BibTeX import failing to translate certain accented characters

I have encountered two issues with Zotero 3.0.11 and the most recent translators:

  1. Accent macros consisting of an alphabetic character are not translated when followed by braces: e.g. \c{c}, \v{u}, and \u{g} (though {\c c}, {\v u} and {\u g} work fine, as do \`{a} and \~{n}).

  2. Accent macros consisting of a non-alphabetic character are not translated when separated from their target character with a space: e.g. {\` a} and {\~ n} do not work, though {\`a} and {\~n} do.

For example, importing

@inproceedings{test,
author = {Lucie Pol{\'a}kov\'{a} and Pavl{\'\i}na J{\'\i}nov{\'a} and Ji\v{r}{\'\i} M{\'\i}rovsk{\'y}},
title = {A\`{a}\v{u}\~{n}\c{c}\u{g} A{\` a}{\v u}{\~ n}{\c c}{\u g}}
editor = {Mehmet U\u{g}ur Do\u{g}an},
}
yields Aà\vuñ\cc\ug A\` aǔ∼ nçğ as the title, though it should be Aàǔñçğ Aàǔñçğ.
  • Thanks for reporting. I'll look into this. There are a couple issues with the BibTeX translator atm. I'm trying to resolve as many as I can in one go.
  • This is still a headache for me. Any progress?
  • while \v{u} is valid (La)TeX, my understanding is that it's not acceptable Bibtex, see e.g. here:
    http://tex.stackexchange.com/questions/57743/how-to-write-a-and-other-umlauts-and-accented-letters-in-bibliography
    since best I can see this would be a _huge_ headache for us to support this (unless aurimas has a clever idea) I'm not inclined to fix this and tell you to fix your bibtex instead ;-).
  • edited May 5, 2013
    Even if the supported way is the preferred BibTeX practice, that convention is sufficiently arcane that as long as the BibTeX compiler supports the other way, people will use it. Here are some examples in the wild:

    http://ufal.mff.cuni.cz/pcedt2.0/publications/bibtex.html#biblio_MiBeAnotacena2005

    http://www.aclweb.org/anthology-new/W/W12/W12-0112.bib

    My main concern is that not handling these escapings effectively breaks site translators that rely on BibTeX, in a way that is not transparent at all to the user. Would it be difficult to first normalize them to the conventional form on import?
  • This variation in escape characters is the main reason I got frustrated with the BibTeX rewrite. I have some ideas how to handle these, but it's a bit of a PITA. Once the new BibTeX translator is released it will support all variants of escaping (or whatever I'm aware of). I don't have an ETA for it yet. It's the next thing on my list, but it's quite a bit undertaking so it will take a while.
  • A related issue I have encountered is that the BibTeX exporter supports precomposed characters, but not composing diacritics. For instance the names on http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2012T08 need to be normalized to their precomposed equivalents in order to export correctly. I have been using http://minaret.info/test/normalize.msp for this, but it would be great if either (a) Zotero always applied the NFC normalization on text entry/import, or (b) the BibTeX exporter handled both variants.
  • could you provide steps to reproduce? I don't think I follow.
  • If I create an entry with "Hajič" (the combining hacek), exporting it to BibTeX gives "Hajic?". But with "Hajič" (precomposed c with hacek), it exports properly as "Haji\v{c}".
  • @aurimas, does https://github.com/ZotPlus/zotero-better-bibtex address these issues? How stable is it at this point?
  • your first issue works in regular Zotero now. The second issue still doesn't, though we could probably fix this (why would there be a space between the character and the letter, though? That just seems wrong).

    better bibtex doesn't address the differently composed characters (unless you're using utf-8, of course, in which case they also work in regular Zotero. Since you seem to work with a lot of different diacrits, that would seem like the most robust solution for you anyway).
  • your first issue works in regular Zotero now. The second issue still doesn't, though we could probably fix this (why would there be a space between the character and the letter, though? That just seems wrong).
    I may have seen such a case in the wild. But the main obstacle does appear to be fixed, which is great. Thanks!
  • Dan, fbennett, and aurimas, have talked more about the character composition issue and I hope that we're closing in on some version of a fix for this, but not sure how long it's going to take. I believe the internal functions are in place, it's mainly a question of where and how to apply it.
Sign In or Register to comment.