OpenOffice: Unicode character not rendered as such in bibliography

One problem gone, the next crops up. There's an entry in my bibliography looking like this:

"Albrecht, Horst. 1993. „Oh Gott Herr Pfarrer\uc0\u8220{}. Religion als Fernsehunterhaltung. In: Die Religion der Massenmedien, 76-84. Stuttgart: W. Kohlhammer."

Doing a quick copy from within Zotero and then inserting it in Writer works flawlessly on the other hand, giving the expected result:

"Albrecht, Horst. 1993. „Oh Gott Herr Pfarrer“. Religion als Fernsehunterhaltung. In Die Religion der Massenmedien, 76-84. Stuttgart: W. Kohlhammer."

As before: Windows 7 32-bit, OpenOffice.org 3.3.0, Firefox 3.6.15, Zotero 2.1.1, Zotero OpenOffice.org extension 3.5a1, Sun/Oracle Java 1.6.0_24-b07.
  • Aha, the user in the other thread didn't come back to insist that the RTF markup was showing through in the word processor, but I now see what is happening. The convention for quotation marks in Germany and Austria is U+201E on the left and U+201C on the right, the latter being equivalent to an open double-quote in English. The right single quote is similarly reversed vis-a-vis English. The locale for German currently sets these characters incorrectly, by the English conventions. So when a user enters quotes into the title field, the flipflop parser sees the closing quote as an orphan open-quote, and escapes it.

    The immediate response will be to fix the locale, which will solve the problem for German sources cited in a German style. However, we will still have the same interesting side effect when these German sources are cited in an English style.

    It may be good policy (after fixing these locales) to encourage users in all domains that use these characters (hello America, Great Britain, Australia, Austria, Germany ...) to use ordinary Courier-style double quotes in in-field markup. These are unambiguous, and will render in output according to the locale of the style.

    Thoughts?

    I will in any case fix up affected locales, and adjust the processor to pass these characters through literally instead of attempting to escape them.
  • It may be good policy (after fixing these locales) to encourage users in all domains that use these characters (hello America, Great Britain, Australia, Austria, Germany ...) to use ordinary Courier-style double quotes in in-field markup.
    agreed, but what do translators do? Do we have non-straight quotes as translator output? Can and should they be changed?
  • So everyone should use straight quotes (") and trust that the locales will make sense of them?
  • That's a suggestion, I'm not sure it's the right one. It would turn on whether the typography within a title should be preserved in foreign-language citation styles. Most immediately, I'll fix the processor so that this corruption doesn't show through as weird output. The conventions can be thrashed out over time as CSL spreads across more language domains. Using vanilla quotes is one possible convention, which would cause markup to track the conventions of the style.
  • Translators save quotes unchanged from how they stand in the input -- be that a website or some import format.
  • Alright, I think we're clear of this one. I've fixed the corrupted output problem in processor release version 1.0.131, just pushed. Looking at the behavior now, it should be fine even with "mismatched" locales; if the output looks funny, the user can adjust the entries. Should be no problem (and my apologies for the brouhaha over refactoring quote marks).
Sign In or Register to comment.