BibTeX and Greek characters

dietlein · January 12, 2009

Starting a new thread based on Dan's suggestion:

From the example posted in the other thread, the exported BibTeX for the title field is now (with the updated translator):

title = {The electrodynamics of substances with simultaneously negative values of e and µ},

Rather than the '?' that appeared prior to the latest translator update. However, there are two problems:
1) Epsilon should be ε, not e, but more importantly...
2) As far as I know, LaTeX/BibTeX chokes on those characters; they should actually be $\epsilon$ and $\mu$, in order to be properly rendered in a bibliography. This is what I would have done if creating the BibTeX entry manually. As it is now, a PDF compiled referencing that BibTeX entry displays:

V. G. Veselago, "The electrodynamics of substances with simultaneously
negative values of and ," Soviet Physics Uspekhi, vol. 10, no. 4, pp. 509-
514, 1968.

What is the supported/recommended method for inclusion of Greek characters in Zotero fields?

Thanks in advance.

dstillman · January 12, 2009

What is the supported/recommended method for inclusion of Greek characters in Zotero fields?

Just to be clear, this has nothing to do with Zotero itself, which supports Unicode natively. It's only an issue with the BibTeX translator.

With an epsilon entered in the field, I get an epsilon (and not an 'e') when I export using the updated BibTeX translator.

Other folks can comment with more authority here, but some newer BibTeX processors support UTF-8 natively, which is why the BibTeX translator now supports UTF-8 output. You could override this by choosing ISO-8859-1 on export, but neither ε or µ are in the (very large but inevitably incomplete) mapping table. We can add those, but you'd probably be better off switching to an all-Unicode BibTeX workflow.

For what it's worth, "textmu" does map correctly, since it's technically a different Unicode character.

dietlein · January 12, 2009

Dan,

Thanks for the response. I think you've cleared up, for me, the current status of things related to this topic. In that case, the rest of this comment is probably going to end up becoming a feature request. The problem, for me and many other users, is more broad than a need for full Unicode support.

In the physical sciences, there are literally thousands of paper titles that contain either math mode (square root, etc.), Greek, or other special LaTeX characters (super- and sub-script for chemical formulae, etc.). It seems variations of this dilemma exist in other fields as well, given all the threads I've seen about bold, italics, small caps, etc., being desired.

I believe the feature that would solve all the problems -- for LaTeX users in the physical sciences -- looks kind of like this - but this is just a rough draft of the idea: A visible-only-when-desired "Title" field, perhaps called "LaTeX Title," that is completely un-escaped, so any users of LaTeX/BibTeX could copy the existing Title field, and modify it appropriately with the desired LaTeX control strings. The field would only override the default Title field when it is populated, and then, only for BibTeX export. This way, fancy "things" (leaving that term open in the broadest sense) could be easily input by those who know how and want them, and the feature would not obtrude in the workflows of anyone else.

The advantage of this for Zotero and BibTeX translator developers is that it does not require implementation of any translation features or mapping tables - the "LaTeX Title" field contents would be used as entered by the Zotero user (verbatim), and any problems in the field would be their own responsibility. It also allows much more fine-grained control than a static or sometimes-upgraded mapping table.

I can provide dozens of examples in my own small library of a few hundred references where this would be useful, but I believe the point emphasized here is valid without examples. Please reply with any comments. I would not at all mind hearing why this is a horrible idea, and why something else would work better. Comments from anyone who has successfully circumvented this issue are extremely welcome.

bdarcus · January 12, 2009

I believe the feature that would solve all the problems -- for LaTeX users in the physical sciences -- looks kind of like this - but this is just a rough draft of the idea: A visible-only-when-desired "Title" field, perhaps called "LaTeX Title," that is completely un-escaped, so any users of LaTeX/BibTeX could copy the existing Title field, and modify it appropriately with the desired LaTeX control strings.

I don't think that's a good solution, since it then by definition only works for LaTeX users.

The proper solution, in my view, is to use unicode wherever possible, and maybe to add MathML support for those cases where it's not*.

*I once had a discussion about this with LaTeX and MathML expert David Carlisle, and his position was that unicode was enough to cover title needs, and that MathML was only really needed for things like abstracts.

dietlein · January 12, 2009

Couldn't that stretch MathML's purpose a bit? I'm referring to completely non-math uses now, to play devil's advocate. Say, I want to set a single letter or word in italic or bold. I admit MathML would work, but it's not quite the right tool for the job.

Also, perhaps Unicode support is not "enough to cover title needs"... as an example, I would like to see this title rendered properly in the Zotero title field:

http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=786664

Unicode, as far as I know, does not have a subscript 'period' (see here). Maybe a Unicode expert can correct me on this. MathML could accommodate this, of course, but it's not math (right tool for the job, etc.).

MathML for math purposes (i.e., a superscripted radical, etc.) would be great, and I agree, all-inclusive. But is it the right tool for the job in non-math cases?

bdarcus · January 12, 2009

Couldn't that MathML's purpose a bit? I'm referring to completely non-math uses now, to play devil's advocate. Say, I want to set a single letter or word in italic or bold. I admit MathML would work, but it's not quite the right tool for the job.

Well, first step is to add a subset of the new rich note UI to certain fields. I was presuming that what you need might be extended from that.

dietlein · January 12, 2009

Back to the original issue - aren't these "remedies" piling up to create a lot of work for the author of the BibTeX translator? The translator would have to correctly translate, then, Unicode, MathML (whenever that is implemented), and whatever format the rich note field is in (whenever that is implemented). As it is, simple Unicode isn't even being properly translated to the math-mode equivalents that work in every LaTeX installation.

The point was to make life easier for everyone, instead of adding another layer of complexity. All that is being asked is a way to input, directly or indirectly, something as simple as $_{0.5}$ or {\it W}, so that editing a .bib file manually is not necessary after every export.

The specific implementation does not really matter, honestly. Anything that would allow editing the Zotero db exactly one time, and have a working export after that, would be perfect. If the issues mentioned (Greek, math, simple subscripts) are currently being addressed by any one solution, or multiple solutions, that's great. Thousands of scientists are being alienated by this simple missing functionality, however, and something has to be done about it. A discussion of solutions would be helpful, and then perhaps the developers can make a decision about which path to take.

bdarcus · January 16, 2009

Re: math and such in TinyMCE (which is what Zotero is using for notes in 1.5), I came across this example.

lgstarn · May 4, 2009

I would like to add that this is also a problem for me. I would like to reference the influential paper by Hopf entitled

The partial differential equation $u_t + uu_x + \mu u_{xx}$

This should have a subscript t under the first u, a subscript x under the second u, a Greek "mu" before the third term, and a subscript xx under the first u.

This comes out in BibTeX as

@article{hopf_partial_1950,
title = {The partial differential equation \$u\_t + uu\_x + {\textbackslash}mu u\_{xx}\$},
volume = {3},
journal = {Communications on Pure and Applied Mathematics},
author = {E. Hopf},
year = {1950},
pages = {201--230}
},

Which is most certainly NOT what I want. If those $ and \ characters were not escaped, this problem would be fixed.

Why can't Zotero just allow me to say I don't want to escape those characters? As it is, every time I want to regenerate my references from Zotero I have to manually change the file. That really sucks.

fbennett · May 4, 2009

I have two questions for LaTeX users following this thread.

First, are all of the mangling issues you are encountering confined to spans enclosed in $...$ or {...}?

Second, are there commonly available pretty-printing utilities for BibTeX files that can reformat output that is syntactically correct, or close to it (back in the day, I remember there was something called "bibclean" by Nelson Beebe)?

Frank Bennett

lgstarn · May 4, 2009

Hi Frank,

Thanks for your quick response!

I'll address your first question since I'm not aware of anything for the second (I can do some research on the second issue if no one else comes up with anything). Personally, I think almost all of my issues with mangling could easily be solved if there was a way for Zotero to avoid escaping fields. Barring this, if it could at least avoid escaping anything between $...$, that would give me access to all LaTeX mathmode commands such as Greek letters, underscores, etc. I could of course see a relatively rare use case where some user had a title such as "On the $11.25 Trillion USD American Debt And What the $36,755.27 USD Per Capita Means." But it seems that escaping these dollar signs manually for BibTeX might be a more reasonable approach than always escaping everything, i.e. it would be easy to write this title as "On the \$11.25 Trillion USD American Debt And What the \$36,755.27 USD Per Capita Means." The relative inconvenience for solving this one rare use case is definitely worth it IMO because LaTeX users will very, very often want to do things like $e^{\mu xt}$ (superscript Greek mu xt over the letter e) or $u_{xx}$ (subscript xx under u).

So I see a couple of potential solutions to the issue:

1) Remove escaping from the BibTeX translator entirely. Whatever data goes into a field is what the .bib file ends up seeing.

Pros:

- this solves the escaping problem cleanly
- it is good design to allow users to pass legitimate data into a program rather than being punished for the program trying to clean up bad data. The "garbage in, garbage out" principle can be applied.
- LaTeX users are very used to manually escaping special characters and not having something escape those characters for them. Thus, this solution is functionality of "least surprise" (another good design principle)

Cons:

- Would backwards compatibility be harmed? Perhaps, but I don't think it would be too difficult to ask BibTeX users to escape their own bibliographies, especially since so many are no doubt unhappy with the current solution
- Is this solution LaTeX specific? Without knowing more of the internals of Zotero it is difficult for me to say

2) Add a button to disable escaping

Pros:
- This provides all the benefits described above
- Since the button could be disabled by default, this maintains backwards compatibility

Cons:
- Is this solution still too LaTeX specific?
- Is there real estate for such a button?
- Does such a button fit well within the overall Zotero GUI framework?

3) Come up with a Zotero language-neutral escaping format.

Pros:

- Maintain compatibility between different formats

Cons:

- Require manual escaping of data
- Requires definition of escaping language mechanisms
- Requires updating all translator parsers
- Adds burden on the user to learn a new mini language

I like solutions one or two.

Best wishes,

Lgstarn

fbennett · May 4, 2009

Lgstarn,

Thanks for this info. It's been a long time since I used LaTeX (although once upon a time I knew TeX reasonably well), and I'm not in the sciences, so I was never that familiar with math mode.

What I'm thinking is that the CSL processor I'm working on (which I'm hopeful will find its way into Zotero eventually) might have enough functionality to render a bibliography in BibTeX format. You could then just pick up the output and use it with a LaTeX document. The keys would be an issue, but (possibly -- the engine is still in development, its functionality is in flux, and it is not yet in any sense part of Zotero) there might be a means of accomplishing targeted escaping of these spans. (The normal use of dollar signs would not require escaping; the "closing" sign is preceded by a space, that would be enough to avoid it.)

Let's play this by ear for awhile. The solution I have in mind is pretty much a daydream at this point, so I won't go into details; and there may be a simple and straightforward way of just fixing the BibTeX exporter so that it works better for your use cases. But I'll keep this item in mind as I chip away at the new processor.

noksagt · May 4, 2009

Why can't Zotero just allow me to say I don't want to escape those characters?

Zotero is not a LaTeX-specific reference manager & the same citation should work on BibTeX-export & through use by the word processor plugins.

The goal you want (rich-text titles) is also not specific to TeX. So why require a TeX-only solution? Instead, when rich-text is supported in titles, BibTeX-export should just keep pace.

Personally, I think almost all of my issues with mangling could easily be solved if there was a way for Zotero to avoid escaping fields.

I wouldn't be too opposed to a workaround of wrapping sections in <math>, or similar. I don't think that Zotero should allow TeX escape characters to be used as-is: they're far too common in titles.

such as Greek letters

What is wrong with UTF-8 text entry for greek letters?

1) Remove escaping from the BibTeX translator entirely.
- Is this solution LaTeX specific?

Yes, of course it is LaTeX specific: titles would be rendered differently on BibTeX export vs all other forms of export/citation.

2) Add a button to disable escaping

This wouldn't break other export, but it would encourage people to use a temporary solution (LaTeX escaping) in preference to a permanent one (rich text+UTF8 that worked everywhere). The option could be hidden fairly well (just as character encoding is hidden now).

3) Come up with a Zotero language-neutral escaping format.
- Require manual escaping of data
- Requires definition of escaping language mechanisms
- Requires updating all translator parsers
- Adds burden on the user to learn a new mini language

Manual-escaping of data isn't a con that is unique to this, though. The 2nd and 3rd cons aren't huge cons, as rich text titles are already a desired feature & this would need to be done anyway. The 4th con is also false; a WYSIWYG GUI could be used (as-is used in notes).

lgstarn · May 4, 2009

Hi noksagt,

I think there is a basic misunderstanding of non-LaTeX users about the nature of LaTeX. Not trying to be a LaTeX snob here or anything but please look at the following PDF:

http://www.maths.manchester.ac.uk/~kd/latextut/slidex.pdf

Take a look at the text in the center of the page. This can be created in LaTeX with the following text:

$ {\cal L}_{_X} \nabla=ev\vert_{_{t=0}}\circ\, \displaystyle{\frac{\partial}{\partial t}}\circ\, \nabla^{^{\varphi_{{}_t}}}, $

Now, I would venture to wager that this example can NOT be done using a combination of rich text and/or Unicode. If I am wrong, please prove it to me.

You see, LaTeX is a formatting language for math and science that was written in 1980s. Since then, pretty much all mathematicians, scientists and engineers have been depending on it to typeset such text. It has thus grown to be a monster that is capable of doing what no other typesetting language can. For example, see the comprehensive LaTeX symbol guide:

www.ctan.org/tex-archive/info/symbols/comprehensive/symbols-a4.pdf

Zotero cannot and should not try to mimic everything that LaTeX can do for math typesetting. It would break the bank in complexity money, so to speak. LaTeX is NOT a word processor. It is a typesetting language. The typesetting commands simply need to be passed into LaTeX - that's all there is to it.

Let me reiterate. The solution I want is the simply the ability to get the text $u_t + uu_x + \mu u_{xx}$ into BibTeX some how, some way without having it get "helpfully" destroyed by Zotero as \$u\_t + uu\_x + {\textbackslash}mu u\_{xx}\$}. In other words, right now Zotero's BibTeX formatter mangles ALL LaTeX characters. The solution I've proposed is just a way to get around LaTeX escaping, not add it in. Right now users of LaTeX are being punished by stupid escaping that only baby users who don't understand LaTeX would want. I would be willing to wager that 99% of users of BibTeX will prefer the non-mangled version if they understand the distinction, which is this:

1) You can have LaTeX formatted titles to put in Greek letters, subscripts, superscripts, math caligraphy style, etc. in your titles. If you have special character like $, %, \, _ in your title, you'll have to escape it.
2) You can never have any LaTeX mathmode formatted titles from Zotero unless you want to manually edit your file each time you regenerate your BibTeX file. To compensate for this huge inconvenience, however, Zotero will automatically escape all of your special characters like $, %, \, _ for you. (Gee, thanks Zotero!!!)

To be blunt, this "feature" is really almost a deal breaker for me for Zotero at the moment. I love everything else about Zotero except this one huge annoyance. It isn't just that Zotero is not a LaTeX-specific reference manager as you say, it is that at the moment Zotero is not a LaTeX reference manager, period.

noksagt · May 4, 2009

I think there is a basic misunderstanding of non-LaTeX users about the nature of LaTeX

I'm a reasonably advanced LaTeX user (most of my papers have been written in LaTeX & many are on arXiv, my thesis was 100% LaTeX, I've submitted patches to various popular TeX projects on TUG and sourceforge, etc.).

Take a look at the text in the center of the page.

Take a look at the title, which is the only element that would appear in the bibliography. It can already be represented in Zotero & works everywhere.

Zotero cannot and should not try to mimic everything that LaTeX can do for math typesetting.

I somewhat agree, though I don't know how much I agree. There are equation-editing plugins for TinyMCE (the rich text note editor that Zotero uses) & I wouldn't see a problem with people making extensions to zotero that made use of MathML or an online TeX->png processor.

The typesetting commands simply need to be passed into LaTeX - that's all there is to it.

No. Not for most citations. There are an extremely few number of titles that cannot be represented by XHTML.

The solution I want is the simply the ability to get the text $u_t + uu_x + \mu u_{xx}$ into BibTeX some how

And the long-range goal should be for you to be able to input "u_t + uu_x + μu_xx" into the title (just as you can already input it into notes). This way, the same, single title would work in Word & OO.o & BibTeX.

Right now users of LaTeX are being punished by stupid escaping that only baby users who don't understand LaTeX would want.

Again: I am a LaTeX user. If I wanted a TeX-exclusive reference manager, I'd be using JabRef. I don't. I want something that can plug into OO.o for when I collaborate with people who don't use LaTeX & I want to be able to trade citations with EndNote users.

I would be willing to wager that 99% of users of BibTeX will prefer the non-mangled version

I could say that I'd think 99% of people would agree with me too. But let's not make up statistics.

To be blunt, this "feature" is really almost a deal breaker for me for Zotero at the moment.

Removing the ability for Zotero to translate between TeX entitites & their UTF representations would be fairly trivial change; it wouldn't be a "smart" change, though.

It isn't just that Zotero is not a LaTeX-specific reference manager as you say, it is that at the moment Zotero is not a LaTeX reference manager, period.

That's news to a lot of people, I think.

lgstarn · May 4, 2009

Hi noksagt,

Thanks for your calm response to my somewhat hot-headed one. I apologize for what I said above.

Well, it seems that for the most part we are on the same page. Let's go back to my original use case and brainstorm a clever way to get the following title:

The partial differential equation u_t + uu_x + μu_xx

Into BibTeX as

@article{hopf_partial_1950,
title = {The partial differential equation $u_t + uu_x + \mu u_{xx}$},
volume = {3},
journal = {Communications on Pure and Applied Mathematics},
author = {E. Hopf},
year = {1950},
pages = {201--230}
},

Ideas?

PS Of course we can agree that the above HTML cludge:

ut + uux + μuxx"

is not the solution. :-)

lgstarn · May 4, 2009

I would also like to reiterate that this actually and truly is a Zotero annoyance for me at the moment. I'm not trying to be difficult but I have to manually run a RegEx to replace the escape characters each time I regenerate my BibTex. If anyone has a short-term solution (short of rewriting the way Zotero handles text) for this situation that would be wonderful.

noksagt · May 4, 2009

For the micro-term (e.g. if I was you & wanted to use Zotero in this way), a manual modification to the BibTeX translator to remove transliteration of escape characters isn't a HORRIBLE idea.

A better short-term solution would be built that acknowledged where we were headed. One such semi-palatable solution would be to surround parts of a title you didn't want escaped with <bibtex> tags, or similar which would never have another meaning in a title (so, at least, the set of hacked entries would be kept separate & could be fixed in the future).

The long-term solution is that the title editor will support rich text, just as the notes editor does. There is a ticket for this:
https://www.zotero.org/trac/ticket/439

They'd be stored as (x)html & the BibTeX exporter would have to be clever enough to replace with_{ and with}. This is trivial & other reference managers do it.

PS Of course we can agree that the above HTML cludge is not the solution. :-)

Yes, the style tags should not be needed for Zotero; I think someone has already reported that sub/sup is broken in the CSS used in these forums. However, using HTML (accessible with a GUI palate) & converting to LaTeX seems like it isn't a bad idea for the limited amount of rich text we probably need to support.

lgstarn · May 4, 2009

Actually, after using my tiny primate brain for a little while, I realized that my short-term solution will be JabRef from above. Thanks noksagt!

I'll continue to post on this thread in the hopes of improving Zotero, but for the moment Zotero has lost me due to this issue. No big loss for Zotero I understand. :-)

lgstarn · May 4, 2009

Oops, I posted at almost the *exact* same time you did noksagt.

"a manual modification to the BibTeX translator to remove transliteration of escape characters isn't a HORRIBLE idea."

I could probably download the source and give this a swing. It would probably take just about as long as switching over to JabREF. Would anyone else be interested in this code, i.e. after I have done my contribution where should it go?

noksagt · May 4, 2009

If all you wish to do is retain escape characters, it shouldn't take long. Refer to:
https://www.zotero.org/trac/browser/extension/trunk/translators/BibTeX.js

e.g.:
Remove the \$ and \_ and \\ from line 1806
Remove textbackslash line in the alwaysmap array

Again: this is a dirty hack to give you what you want now, but it isn't a good solution going forward. I'd tag records that you force TeX underscores on, so you could fix them later. For entries that have greek characters & only sub/super-ed arabic numerals, you should use the UTF-8 entities & they should work both in and out of BibTeX.

lgstarn · May 4, 2009

Yes! That worked. Thanks again noksagt!

For posterity, I'll leave explicit instructions if anyone else wishes to implement this dirty hack in his or her own local copy without even messing with SVN.

1) Find your profile. Here's how if you need it: http://support.mozilla.com/en-us/kb/Profiles
2) Go under the zotero\translators and edit the BibTeX.js file
3) Change line the "alwaysMap" array (line 1528) from the following:

var alwaysMap = {
"|":"{\\textbar}",
"<":"{\\textless}",
">":"{\\textgreater}",
"~":"{\\textasciitilde}",
"^":"{\\textasciicircum}",
"\\":"{\\textbackslash}"
};

to the following

var alwaysMap = {
"|":"{\\textbar}",
"<":"{\\textless}",
">":"{\\textgreater}" //,
// "~":"{\\textasciitilde}",
// "^":"{\\textasciicircum}"
// "\\":"{\\textbackslash}"
};

This will remove escaping ~, ^ and \.

Now change the following on line 1806:

value = value.replace(/[|\<\>\~\^\\]/g, mapEscape).replace(/([\#\$\%\&\_])/g, "\\$1");

to the following:

value = value.replace(/[|\<\>]/g, mapEscape).replace(/([\#\%\&])/g, "\\$1");

4) Save the file. You won't even need to restart Firefox!

Awesome. I'm a happy tomato. Sorry for the trouble and thanks again for the help!

yzarc · October 18, 2009

Hi, I'm running into the same problem as lgstarn. My opinion is that on BibTeX export, Zotero should simply detect pairs of $ and not escape their contents. I would be very surprised if anyone found a title in which a pair of $'s could be falsely mistaken for LaTeX code. This would save people like me and lgstarn a lot of trouble! Thank you.

teatime · November 7, 2009

I agree with yzarc: the chance that someone has two dollar signs (other than wanting to use latex math) in a title is minimal, and behavior like this would make all the mathematicians much happier and more likely to use zotero.

noksagt · November 10, 2009

Again: Zotero isn't a BibTeX manager. What would it be expected to do if it sees that syntax & is not using BibTeX? Better to design the rich text title support very well, so that everyone is happy. Anything else would just be a hack.

teatime · November 10, 2009

Well actually there are some more not-so-pleasant features of zotero&bibtex combo:
- in math papers, sometimes you need to have things which look like regular text to be typeset in math mode, e.g. "$K$-theory", not "K-theory" (so pure rich text support is definitely not sufficient - an extra pair of tags saying <math> would be sufficient though)
- there's another TeX quirk (which I agree might not be that critical to have as soon as possible, so this is just for the record): sometimes it uses macros (=functions) to capitalize or de-capitalize article titles, and it is important that some letters need to be kept capital regardless of the context (for instance in "book name, Vol I.", 'V' and 'I' need to be capital). The usual way to force bibtex to keep these capital is to enclose them with {}, e.g. "book name, {V}ol {I}."

Anyways, the mantra "zotero isn't a bibtex manager", while true, does not justify what you're implying, namely that people trying to use zotero with bibtex should shut up and wait for an extra feature which is actually much more complicated than what they need. The least someone knowledgeable as you could do is to tell us how to implement the hack, so that we can actually *use zotero to do what it is supposed to do* (namely "..help you ... cite your research sources" [the main zotero website]).

noksagt · November 10, 2009

The least someone knowledgeable as you could do is to tell us how to implement the hack

I've done exactly this in this thread & others have successfully implemented it. Feel free to post what difficulties you are having in doing the same.

But I (a BibTeX user) maintain that it doesn't really make sense to rely on this hack in the core Zotero.

teatime · November 11, 2009

Yes, you're right, the hack actually above works great. Thanks!

The only thing that gets messed up right now is the {} for capitalization (import .bib -> export .bib does not preserve that, but I guess I can't have everything :)

stefanmeir · September 23, 2011

Nearly two years have passed since this thread ...
I'm one of those scientist who has to use all those Latex features (sub- und superscripts, math, ...) in Zoteros title fields and export them to Bibtex. As others I am extremely unhappy with Zoteros escaping policy.
Has a solution emerged in the meantime (I couldn't find one so far) or will I have to implemented above meantioned hack?

adamsmith · September 23, 2011

I believe that's still pretty much the status quo. It's a thorny issue.

stefanmeir · September 23, 2011

Thanks a lot Adam.
Has anybody ever thought about regularly "forking" BibTex.js, removing the escaping funcionality and redistributing it as BibTex_noEscaping.js?
Would that work? How frequently would one have to build a new file?