Bibtex export for dotted capital I

Mikael Öhman · September 9, 2014

Bibtex export seems to correctly rewrite most accented characters, but it leaves (dotted capital I) İ as it is, but it should be {\.I}

noksagt · September 9, 2014

Zotero should already convert the character 'İ' (U+0130) as you describe. Copy/paste this in and test it.

Zotero has no good way of handling composing diacritics, and this is likely the issue you have run into--I'd imagine your character is compsed of the LATIN CAPITAL LETTER I (U+0049) COMBINING DOT ABOVE (U+0307), rather than being the pre-composed character that works.

zuphilip · September 9, 2014

Well, it works here, I just tried it out. Moreover, there is the following line in the code:


"\u0130":"{\\.I}", // LATIN CAPITAL LETTER I WITH DOT ABOVE

Which indicates the replacing of unicode character U+0130 with the LaTeX code for every encoding except UTF8.

drevicko · May 2, 2015

Is that supposed to happen with an umlauted u: ü

I'm finding that it exports without change (and is subsequently ignored by Latex).

aurimas · May 2, 2015

What version of Zotero do you have? Are you only seeing that with ü? (What if you try another character, like the İ above?) Could you copy-paste the value that contains the ü from Zotero here?

adamsmith · May 2, 2015

(note that BibLaTex (as opposed to BibTeX) will export all UTF-8 characters as they are since biber can handle utf-8)

drevicko · May 2, 2015

Zotero 4.0.26.4

no, that character also appears in exported bibtex:

author = {İLü, Linyuan and Zhou, Tao},

I turned on debug during an export of that item if it's any use: Debug ID is D8556549

I'm on MacOsx 10.8.5 (ie: not Yosemite) with firefox 37.0.2

I'll have a look at BibLaTex..
Thanks for your quick responses (:

dstillman · May 2, 2015

Note the "for every encoding except UTF8" above. By default, Zotero will export all Unicode characters as is in BibTeX, because Zotero defaults to UTF-8 for everything.

You have to enable the export character set option in the preferences and select something other than UTF-8 (e.g., "Western") to get the substitutions.

drevicko · May 3, 2015

Thanks Dan - that's what's going on. Unfortunately, I'm accessing zotero via latexing (a sublime text plugin). It seems it's use of the zotero API follows the same pattern (?). In any case, I don't have control over the encoding latexing requests from the API, and I'm not in a position right now to migrate to biblatex. Also, latexing is not yet using zotero's API version 3, though it should be quite soon.

I've opened an issue with latexing about this for those interested: https://github.com/LaTeXing/LaTeXing/issues/144

adamsmith · May 3, 2015

note that many bibtex implementations can handle utf-8, too, which is why we're defaulting to it. You just have to add
\usepackage[utf8]{inputenc}
into the document head.

dstillman · May 3, 2015

Yes, the API returns BibTeX only as UTF-8, and I doubt we'll offer an option to change that.

drevicko · May 3, 2015

Cool! Thanks Adam - that fixed things for me. It's been remarkably hard to find that solution, probably because I've been looking in the wrong places! I'm tempted to put it up on tex.stackexchange...

adamsmith · May 3, 2015

covered here: http://tex.stackexchange.com/questions/118313/a-follow-up-question-on-managing-bibliography-workflow/118393

adamsmith · October 11, 2016

@Dan%20Stillman -- this just came up again in the context of Overleaf/Zotero integration (which really is nice):
https://twitter.com/chrshmmmr/status/785835858586894336
http://support.overleaf.com/forums/137318-feedback/suggestions/7321988-bibtex-zotero-utf8-convert-bib-files-to-late

I wonder if you'd
a) consider revisiting (or potentially even defaulting to) exporting escaped (i.e. Western encoded) BibTeX, since that's still, as I understand, the official standard for bibtex, even if some engines can handle utf-8
b) enabling biblatex export as an API export option to work better with biber

I'd like to see both, but either one individually I think would make sense and be helpful.

dstillman · October 12, 2016

I don't see us defaulting to escaped characters. There's no excuse for an engine's not supporting UTF-8 in 2016.

Any reason enabling biblatex export via the API would be problematic? I'm happy to add it to the list of allowed types, if that's all that's required.

adamsmith · October 12, 2016

I'm not really going to argue with "no excuse," but there's equally no excuse for a flat tagged format for bibliographic exchange in 2016 and yet here we are using RIS as a major exchange format...
It is my understanding that BibTeX simply does not support unicode, see http://wiki.lyx.org/BibTeX/Tips#encoding There are ways to make it work (e.g. via JabRef, BibDesk, etc.), but it doesn't work natively and so a plain implementation of LaTeX with BibTeX will fail using Zotero's default BibTeX. I think that's arguable where users have the choice between encodings (i.e. Zotero locally), but I don't think this is a good idea where they don't (i.e. with the API).
I'd be curious what @noksagt , who understands bibtex much better than me, thinks.

Either way, I think enabling biblatex via API would be great. I can't see any issues, but obviously may be missing something.

and edit: woah Twitter integration.

bwiernik · October 12, 2016

(I just want to chime in equally with the woah Twitter integration--I got super excited when I saw that.)

dstillman · October 12, 2016

I guess one option would be to allow "Accept-Charset: iso-8859-1" to trigger the same escaping via the API...

For now, I've enabled 'biblatex' export.

adamsmith · October 12, 2016

I'd be equally happy with the charset option&thanks for adding biblatex!

noksagt · October 12, 2016

There's no excuse for an engine's not supporting UTF-8 in 2016.

And the printed word is dead, right?

Perhaps not quite. But publishers do move slowly. The last time BibTeX was updated (2010), over 50% of websites used a characterset other than UTF-8, which says nothing of the peculiar siloed tools of the less-connected "dinosaur" publishers.

CTAN encourages the use of biber+BibLaTeX. So, yes: exporting this would be an improvement over what we have now.

But many publishing platforms still don't support that toolchain. I don't think Zotero has the marketshare to make a principled stand to effect change here. And (sadly) there is still a need for ASCII-only BibTeX. Since citation managers should supply users with the data in the format that their users need, supporting it does more good than harm & I guess I don't buy the ideological resistance to it. I think adamsmith is right.

dstillman · October 12, 2016

Well, it's only principled to the extent that we're defaulting to exporting a cleaner, easier-to-read format that works fine (as I understand it) with modern toolchains — no one is suggesting that we get rid of support for escaping (and I proposed an option to allow that from the API). But you're arguing that that we should generate ASCII files by default?

noksagt · October 12, 2016

I had started my reply before your post from 42 minutes ago. I think that having the option is fine & I have no preference on what the default should be. UTF-8 by default and allowing iso-8859-1 would, at least, mirror the desktop clients.

zuphilip · October 12, 2016

Can we make the options in the export dialog maybe easier? AFAIS we just need the options a) use UTF-8, or, b) use ASCII and escape all special characters. Is there any reason in 2016 to use another option/character encoding? Moreover, we should remember, that the character encoding options are not shown by default for the users, but one has to activate them in the Zotero preferences under export.

adamsmith · October 12, 2016

For now, I've enabled 'biblatex' export.

tested&added to API documentation. Thanks again. Let me know when the charset option becomes available

Rintze · October 13, 2016

Speaking about UTF-8, it is correct and desirable that CSL JSON is exported as "us-ascii" instead of "utf-8"?

When I export an item to CSL JSON in Zotero, and check on it on macOS, I get:

$ file --mime '''ExportedZ.json'''
ExportedZ.json: text/plain; charset=us-ascii

adamsmith · October 13, 2016

(As long as you only use ascii characters, file has no way to distinguish the encoding. If you export an item e.g. with umlauts, it shows charset=utf-8)

dstillman · October 16, 2016

Can we make the options in the export dialog maybe easier? AFAIS we just need the options a) use UTF-8, or, b) use ASCII and escape all special characters. Is there any reason in 2016 to use another option/character encoding?

That's a good question. Probably not.

I can't remember why we didn't have the export charset show by default — I think at the time it was in a separate window that we didn't want to show everyone? In any case, I don't see a problem always showing that menu for the translators that support it, and only showing "Unicode (UTF-8)", maybe "Unicode (UTF-8 without BOM)", and "Western (Windows-1252)" (which is what "Western" is now — "ASCII" and "ISO-8859-1" are both aliases for that in the Encoding Standard) in the menu. In the case of BibTeX it would actually mean ASCII, but probably clear enough if we always show the menu and only have those options.