Zotero should already convert the character 'İ' (U+0130) as you describe. Copy/paste this in and test it.
Zotero has no good way of handling composing diacritics, and this is likely the issue you have run into--I'd imagine your character is compsed of the LATIN CAPITAL LETTER I (U+0049) COMBINING DOT ABOVE (U+0307), rather than being the pre-composed character that works.
Well, it works here, I just tried it out. Moreover, there is the following line in the code: "\u0130":"{\\.I}", // LATIN CAPITAL LETTER I WITH DOT ABOVE
Which indicates the replacing of unicode character U+0130 with the LaTeX code for every encoding except UTF8.
What version of Zotero do you have? Are you only seeing that with ü? (What if you try another character, like the İ above?) Could you copy-paste the value that contains the ü from Zotero here?
Note the "for every encoding except UTF8" above. By default, Zotero will export all Unicode characters as is in BibTeX, because Zotero defaults to UTF-8 for everything.
You have to enable the export character set option in the preferences and select something other than UTF-8 (e.g., "Western") to get the substitutions.
Thanks Dan - that's what's going on. Unfortunately, I'm accessing zotero via latexing (a sublime text plugin). It seems it's use of the zotero API follows the same pattern (?). In any case, I don't have control over the encoding latexing requests from the API, and I'm not in a position right now to migrate to biblatex. Also, latexing is not yet using zotero's API version 3, though it should be quite soon.
I've opened an issue with latexing about this for those interested: https://github.com/LaTeXing/LaTeXing/issues/144
note that many bibtex implementations can handle utf-8, too, which is why we're defaulting to it. You just have to add
\usepackage[utf8]{inputenc}
into the document head.
Cool! Thanks Adam - that fixed things for me. It's been remarkably hard to find that solution, probably because I've been looking in the wrong places! I'm tempted to put it up on tex.stackexchange...
I wonder if you'd a) consider revisiting (or potentially even defaulting to) exporting escaped (i.e. Western encoded) BibTeX, since that's still, as I understand, the official standard for bibtex, even if some engines can handle utf-8 b) enabling biblatex export as an API export option to work better with biber
I'd like to see both, but either one individually I think would make sense and be helpful.
I'm not really going to argue with "no excuse," but there's equally no excuse for a flat tagged format for bibliographic exchange in 2016 and yet here we are using RIS as a major exchange format... It is my understanding that BibTeX simply does not support unicode, see http://wiki.lyx.org/BibTeX/Tips#encoding There are ways to make it work (e.g. via JabRef, BibDesk, etc.), but it doesn't work natively and so a plain implementation of LaTeX with BibTeX will fail using Zotero's default BibTeX. I think that's arguable where users have the choice between encodings (i.e. Zotero locally), but I don't think this is a good idea where they don't (i.e. with the API). I'd be curious what @noksagt , who understands bibtex much better than me, thinks.
Either way, I think enabling biblatex via API would be great. I can't see any issues, but obviously may be missing something.
There's no excuse for an engine's not supporting UTF-8 in 2016.
And the printed word is dead, right?
Perhaps not quite. But publishers do move slowly. The last time BibTeX was updated (2010), over 50% of websites used a characterset other than UTF-8, which says nothing of the peculiar siloed tools of the less-connected "dinosaur" publishers.
CTAN encourages the use of biber+BibLaTeX. So, yes: exporting this would be an improvement over what we have now.
But many publishing platforms still don't support that toolchain. I don't think Zotero has the marketshare to make a principled stand to effect change here. And (sadly) there is still a need for ASCII-only BibTeX. Since citation managers should supply users with the data in the format that their users need, supporting it does more good than harm & I guess I don't buy the ideological resistance to it. I think adamsmith is right.
Well, it's only principled to the extent that we're defaulting to exporting a cleaner, easier-to-read format that works fine (as I understand it) with modern toolchains — no one is suggesting that we get rid of support for escaping (and I proposed an option to allow that from the API). But you're arguing that that we should generate ASCII files by default?
I had started my reply before your post from 42 minutes ago. I think that having the option is fine & I have no preference on what the default should be. UTF-8 by default and allowing iso-8859-1 would, at least, mirror the desktop clients.
Can we make the options in the export dialog maybe easier? AFAIS we just need the options a) use UTF-8, or, b) use ASCII and escape all special characters. Is there any reason in 2016 to use another option/character encoding? Moreover, we should remember, that the character encoding options are not shown by default for the users, but one has to activate them in the Zotero preferences under export.
(As long as you only use ascii characters, file has no way to distinguish the encoding. If you export an item e.g. with umlauts, it shows charset=utf-8)
Can we make the options in the export dialog maybe easier? AFAIS we just need the options a) use UTF-8, or, b) use ASCII and escape all special characters. Is there any reason in 2016 to use another option/character encoding?
That's a good question. Probably not.
I can't remember why we didn't have the export charset show by default — I think at the time it was in a separate window that we didn't want to show everyone? In any case, I don't see a problem always showing that menu for the translators that support it, and only showing "Unicode (UTF-8)", maybe "Unicode (UTF-8 without BOM)", and "Western (Windows-1252)" (which is what "Western" is now — "ASCII" and "ISO-8859-1" are both aliases for that in the Encoding Standard) in the menu. In the case of BibTeX it would actually mean ASCII, but probably clear enough if we always show the menu and only have those options.
Zotero has no good way of handling composing diacritics, and this is likely the issue you have run into--I'd imagine your character is compsed of the LATIN CAPITAL LETTER I (U+0049) COMBINING DOT ABOVE (U+0307), rather than being the pre-composed character that works.
"\u0130":"{\\.I}", // LATIN CAPITAL LETTER I WITH DOT ABOVE
Which indicates the replacing of unicode character U+0130 with the LaTeX code for every encoding except UTF8.
I'm finding that it exports without change (and is subsequently ignored by Latex).
no, that character also appears in exported bibtex:
author = {İLü, Linyuan and Zhou, Tao},
I turned on debug during an export of that item if it's any use: Debug ID is D8556549
I'm on MacOsx 10.8.5 (ie: not Yosemite) with firefox 37.0.2
I'll have a look at BibLaTex..
Thanks for your quick responses (:
You have to enable the export character set option in the preferences and select something other than UTF-8 (e.g., "Western") to get the substitutions.
I've opened an issue with latexing about this for those interested: https://github.com/LaTeXing/LaTeXing/issues/144
\usepackage[utf8]{inputenc}
into the document head.
https://twitter.com/chrshmmmr/status/785835858586894336
http://support.overleaf.com/forums/137318-feedback/suggestions/7321988-bibtex-zotero-utf8-convert-bib-files-to-late
I wonder if you'd
a) consider revisiting (or potentially even defaulting to) exporting escaped (i.e. Western encoded) BibTeX, since that's still, as I understand, the official standard for bibtex, even if some engines can handle utf-8
b) enabling biblatex export as an API export option to work better with biber
I'd like to see both, but either one individually I think would make sense and be helpful.
Any reason enabling biblatex export via the API would be problematic? I'm happy to add it to the list of allowed types, if that's all that's required.
It is my understanding that BibTeX simply does not support unicode, see http://wiki.lyx.org/BibTeX/Tips#encoding There are ways to make it work (e.g. via JabRef, BibDesk, etc.), but it doesn't work natively and so a plain implementation of LaTeX with BibTeX will fail using Zotero's default BibTeX. I think that's arguable where users have the choice between encodings (i.e. Zotero locally), but I don't think this is a good idea where they don't (i.e. with the API).
I'd be curious what @noksagt , who understands bibtex much better than me, thinks.
Either way, I think enabling biblatex via API would be great. I can't see any issues, but obviously may be missing something.
and edit: woah Twitter integration.
For now, I've enabled 'biblatex' export.
Perhaps not quite. But publishers do move slowly. The last time BibTeX was updated (2010), over 50% of websites used a characterset other than UTF-8, which says nothing of the peculiar siloed tools of the less-connected "dinosaur" publishers.
CTAN encourages the use of biber+BibLaTeX. So, yes: exporting this would be an improvement over what we have now.
But many publishing platforms still don't support that toolchain. I don't think Zotero has the marketshare to make a principled stand to effect change here. And (sadly) there is still a need for ASCII-only BibTeX. Since citation managers should supply users with the data in the format that their users need, supporting it does more good than harm & I guess I don't buy the ideological resistance to it. I think adamsmith is right.
When I export an item to CSL JSON in Zotero, and check on it on macOS, I get:
$ file --mime '''ExportedZ.json'''
ExportedZ.json: text/plain; charset=us-ascii
I can't remember why we didn't have the export charset show by default — I think at the time it was in a separate window that we didn't want to show everyone? In any case, I don't see a problem always showing that menu for the translators that support it, and only showing "Unicode (UTF-8)", maybe "Unicode (UTF-8 without BOM)", and "Western (Windows-1252)" (which is what "Western" is now — "ASCII" and "ISO-8859-1" are both aliases for that in the Encoding Standard) in the menu. In the case of BibTeX it would actually mean ASCII, but probably clear enough if we always show the menu and only have those options.