BibTeX export/Import and Chinese text in UTF-8

Here at the Australian National University, we have a facility under development to allow researchers to poll a number of data sources and generate a unified list of their publications, published data sets and conference papers.

The aim behind this is to increase portability and allow people to take their work with them.

We have chosen BibTeX as an export format as Zotero, Endnote and Mendeley can all accept BibTeX as an import format.

We have however come unstuck as to what to do about Chinese UTF-8 text as BibTex, by default does not have a standard method for handling UTF-8.

Pragmatically, as we're using BibTeX as a lowest common denominator data migration format all we need do is what our migration targets do.

Is there a document describing how Zotero reads utf-8 encodings from BibTeX formatted files ?
  • edited August 13, 2013
    I'm not sure what you mean by "utf-8 encodings". UTF-8 is the encoding, and Zotero supports UTF-8 for BibTeX import/export (and for all other file formats). My understanding is that not all BibTeX tools support Unicode (or at least didn't a few years ago) but as long as you generate a valid UTF-8 file it should work as you would expect.

    What's the problem, exactly?
  • The problem is that bibtex is pre unicode and does not have a standard way of representing unicode characters. In fact it's only really happy with single byte characters using the escape conventions for characters with the top bit set.

    For data interchange purposes this means that a file generated from application A may not load correctly into application B, or at least give odd results.

    In the absence of a standard method what we need to do is generate files in a way that is compatible with our candidate target applications, ie if I give you Japanese text in a BibTeX document how do you expect the non ASCII characters to be represented ?
  • edited August 13, 2013
    I'm not sure what you answer you're looking for from us. The Zotero BibTeX translator has a mapping table of escape sequences for many extended (but of course not Chinese) characters that's used on import and if you set the export character set to ASCII or similar, but Zotero happily imports and exports UTF-8 in all supported formats, BibTeX included. And, as I say above, my (uninformed) understanding is that there are in fact some BibTeX tools that can handle UTF-8. If you're using tools that can't, I'm not sure what to tell you. Use more modern tools. This works fine on Zotero's end.
  • edited August 13, 2013
    Zotero will import UTF-8 encoded files. That's the only way to get CJK in.

    While the 'bibtex' program is not fully compatible with UTF-8, there are plenty of other clients that can use a UTF-8 BibTeX file (including biber/biblatex for the toolchain of actually using this in a LaTeX document). I think (but have not checked) that Mendeley is one. I had thought that Endnote couldn't import BibTeX (certainly, in earlier versions, it was necessary to convert the file to a format it could import).
  • That's the bit of the jigsaw we need - that you don't do anything funky with UTF-8 input ...

This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.

Sign In or Register to comment.