Default encoding in BibTex export
I am running Zotero 3.0.8 on Firefox 3.0.1 on Ubuntu.
I noticed that for quick copy (Ctrl+Alt+C), for some reason the BibTeX export format doe not do latex substitution for non-ASCII Unicode characters (for example, u with an umlaut is not rendered as \"{u}). With some help from a friend, I found that the BibTeX.js translator actually does have the capability for translating to latex code like the above, but for some reason does not use it during quick copy.
I think this happens because of the following line (line 10 in BibTeX.js)
"exportCharset": "UTF-8",
I could solve my problem by changing this line to
"exportCharset": "ISO-8859-1",
This ensures that the Unicode to latex translation kicks in during quick copy.
Since the most popular latex implementation today is pdflatex, which has little support for Unicode, and since (I believe) most users would therefore prefer their BibTeX output to use latex code rather than Unicode characters that pdflatex cannot handle, I think the default version of the translator should have
"exportCharset": "ISO-8859-1",
rather than UTF-8. is there a reason then that the translator uses UTF-8 rather than ISO-8859-1?
Of course, another way to solve this problem would be to give an option for specifying a default character encoding for quick copy, but I don't know how to do that.
I noticed that for quick copy (Ctrl+Alt+C), for some reason the BibTeX export format doe not do latex substitution for non-ASCII Unicode characters (for example, u with an umlaut is not rendered as \"{u}). With some help from a friend, I found that the BibTeX.js translator actually does have the capability for translating to latex code like the above, but for some reason does not use it during quick copy.
I think this happens because of the following line (line 10 in BibTeX.js)
"exportCharset": "UTF-8",
I could solve my problem by changing this line to
"exportCharset": "ISO-8859-1",
This ensures that the Unicode to latex translation kicks in during quick copy.
Since the most popular latex implementation today is pdflatex, which has little support for Unicode, and since (I believe) most users would therefore prefer their BibTeX output to use latex code rather than Unicode characters that pdflatex cannot handle, I think the default version of the translator should have
"exportCharset": "ISO-8859-1",
rather than UTF-8. is there a reason then that the translator uses UTF-8 rather than ISO-8859-1?
Of course, another way to solve this problem would be to give an option for specifying a default character encoding for quick copy, but I don't know how to do that.
There are two reasons we use UTF-8 by default instead of ISO-8859-1:
You can set the export charset when you right-click export to Bibtex (when the option is checked under export in the preferences).
I believe Zotero remembers the last char-set option for quick-copy.
Thanks for the response. I understand the decision to use UTF-8 in the interests of general portability.
adamsmith:
I am not sure if bibtex does, but pdflatex certainly has a problem. Fortunately, there seems to be an easy way to fix it at least for some of the common characters (accented roman characters and copyright symbols etc.) even while using pdflatex, by adding the line
\usepackage[utf8]{inputenc}
in the latex source code. This however may not work properly with some bibtex styles (typially those which try to include the UTF-8 character in the citation key).
On the other hand, yes, it is certainly possible to set the export charset when you do an export to file. However, during quick-copy, this is not possible, as Simon pointed out above.
I'm building a .bib file as I go using drag and drop, which is wonderful, but non-ASCII characters get mangled as soon as I run BibTeX :-(
Thanks
I've modified the encoding for quick copy in bibtex.js for now, but I have a feeling that I'll have to do that again every time there is a Zotero update... ;-)
You can have a separate bibtex translator with a different id, but that way you wouldn't get the automatic updates, which often contain small fixes & improvements.