German Umlauts in BibTeX export

monoceros84 · August 4, 2009

Hi

If I export my collection as BibTeX with ISO-8859-1 or -15 encoding, the German Umlauts ä, ö and ü are converted to e.g. \"{u}. This is wrong, it has to be {\"u}. No idea about the ß, but it's prob. a similar behaviour.
Are the special characters of other languages coded the same way?

This bug is very important to be fixed since BibTeX is not recognizing the letters. I get "nice" bibliographic label as M\"90 (yes, a 9 with two dots above), which should be M{\"u}90 instead.

Cheers,
Mathias

noksagt · August 4, 2009

Both of these are valid ways to write "ü" in LaTeX:

\"u
\"{u}

But yes, BibTeX (according to Oren Patashnik's manual), wants all accented characters in a single set of braces. This should be relatively easy to do, but we will need to double-check against the brace hack to preserve capitalization when non-initial uppercase letters appear in a field.

The output should be:

author = {M{\"{u}}ller, Erwin},
title = {The {IP}{\"{O}} is the Institute for Polar Ecology}

and should NOT BE :title = {The {IP{\"{O}}} is the Institute for Polar Ecology}

monoceros84 · August 5, 2009

Mmhh, right. I didn't know about this brace hack :)
But as you've already said: BibTeX wants those braces around accented characters. I don't see a way around. At the moment the BibeX export is not very useful - I'll always have to edit the bib-file manually.

Is it possible to get a bugfix with the next update?

monoceros84 · August 5, 2009

Another questions that rose in the discussion on de.comp.text.tex:
Why is Zotero masking the Umlauts in ISO-8859-15? They could be printed as ä, ö and ü instead. Masking them is only necessary in full ASCII mode...

noksagt · August 5, 2009

Zotero uses the LaTeX entities in all charactersets that are not 'UTF-8'.

The transliteration tables doubtless include some characters that are absent from other character sets & transliteration is "all" or "nothing."

I don't know, off-hand, what characters are missing from which sets.

But why use ISO-8859-15 anyway? Are there any common tools that don't work with the full set of UTF-8, but wouldn't need the TeX-encoded entities?

monoceros84 · August 6, 2009

Yes, e.g. BibTeX and BibLaTeX can handle those symbols in ISO-8859-15 encoded files. But those tools can't handle UTF8.
You might have a look into the BibLaTeX documentation, section 2.4.3. It explains all the details.

I don't know if you understand German?! Actually, there is a discussion in the German newsgroup. Have a look here:
http://groups.google.de/group/de.comp.text.tex/browse_thread/thread/13307dde8a8de7a3?pli=1

noksagt · August 6, 2009

My german is poor, but I believe that you are mistaken. bibtex is 7-bit only & must use TeX-encoded entities to work. The newer 'bibtex8' and 'biber' programs can handle 8-bit character sets. bibtex8 can't do multibyte, but biber should be able to.

monoceros84 · August 6, 2009

Right, I think we talk about the same - I only wasn't that exactly. :)
If I run a bib file with 8-bit characters as ä, ö and ü through bibtex I get readable results. But this case is tricky, sometimes some letters are left out etc. Definitely not a way to go.
But bibtex8 can do. So what I meant up there was BibLaTeX in companion with BibTeX8. In this case I could use ISO-8859-15 encoded files with 8-bit characters, couldn't I?
So maybe the standard encoding for the bibtex export filter in Zotero should be ASCII with the appropriate symbols {\"u} (not \"{u}) for maximum compatibility. But if one chooses ISO-8859-15 one also expects ISO-8859-15 with 8-bit characters. ;)

I hope this is possible.

noksagt · August 6, 2009

Presently, there is not a separate ability to export using ASCII encoding.

And, as above, the ability to transliterate characters to/from TeX encoding has complex tables. It doesn't seem obvious to me that the added complexity of having per-character set transliteration tables is worth it. I'm aware of no other program that really does this (including BibTeX-specific applications, such as JabRef).

UTF-8 or the ASCII-subset of ISO-8859-1 with TeX-encoded entities seem sufficient for the vast majority of use cases. If TeX encoded characters were properly braced, I don't see that you've raised any advantage to also having single-byte 8-bit output.

monoceros84 · August 6, 2009

Right. As long as UTF-8 and ASCII is working well, there is no need. But currently at least the ASCII notation is not correct and BibTeX complains about it...

dstillman · November 14, 2011

Just for posterity:

Presently, there is not a separate ability to export using ASCII encoding.

"English (US-ASCII)" is an option in the character encoding list in the US version of Firefox (though seemingly not in some other versions), and it causes all extended characters to be replaced with TeX entities.

erikku75 · October 11, 2018

hello--writing in October 2018, I find the comment by the OP to still be a problem, e.g. exporting names w/ umlauts from Zotero to Bibtex, {\"u} got turned into \{{\textbackslash}"u\} (in fact, even worse than OP!). This is still incorrect, I need the name to be exported with exactly this {\"u} , or I have to go through and manually correct. Any chance this will be corrected? Thank you.

noksagt · October 11, 2018

I probably would not have resurrected this thread, as your gripe is only related tangentially.

The behavior you see is intended.

Zotero is a general-purpose reference manager and the stock version does not intend you to use LaTeX markup in any fields. When exporting to bibtex, it will use markup to best preserve what you have entered literally.

If you edit your title to use "ü" instead, it will export to BibTeX in the way you expect.

Alternatively, the Better Bib(La)TeX extension will export literal LaTeX that you enclose in pre tags. But this is overkill if you're just trying to enter "ü" and will make your database less useful for any workflows that do not use better BibTeX.

erikku75 · October 12, 2018

Thank you noksagt. Guess I thought my gripe was dead on! well, maybe I miss the point.

Anyway, though, Zotero doesn't do what you say: if I enter "ü", the letter is simply dropped from the exported author name (resulting in Bchner rather than Büchner); this observation is what set me off on this search. On the off-chance you meant me to include the quotes, well I tried that and amusingly get a square-root symbol! I wonder, though, is Zotero *supposed* to just carry the ü through, as you say? Perhaps that would help other bibliographies, but for Bibtex I'd still need the proper markup, {\"u}. And, maybe my version is out of whack? (5.055.1, only slightly behind, can't imagine that's the problem...)

I have not tried the Better Bib(La)TeX extension, but will give it a shot, thanks for pointing it out. I don't have any workflows outside of Bibtex, so no worries there.

zuphilip · October 12, 2018

> if I enter "ü", the letter is simply dropped from the exported author name (resulting in Bchner rather than Büchner)

Check that you choose Western as the character encoding in the BibTeX export instead of the UTF-8.

adamsmith · October 12, 2018

Zotero should never drop the ü, but the treatment of Umlauts in bibtex depends on the character encoding. In UTF-8 they just stay, in all other encodings they are LaTeX escaped.

erikku75 · October 12, 2018

gotcha! now that was useful, thank you, worked like a charm.

emilianoeheyns · October 12, 2018

In BBT it's not dependent on the charset, it's a setting in the preferences.