BibTeX and Greek characters
Starting a new thread based on Dan's suggestion:
From the example posted in the other thread, the exported BibTeX for the title field is now (with the updated translator):
title = {The electrodynamics of substances with simultaneously negative values of e and µ},
Rather than the '?' that appeared prior to the latest translator update. However, there are two problems:
1) Epsilon should be ε, not e, but more importantly...
2) As far as I know, LaTeX/BibTeX chokes on those characters; they should actually be $\epsilon$ and $\mu$, in order to be properly rendered in a bibliography. This is what I would have done if creating the BibTeX entry manually. As it is now, a PDF compiled referencing that BibTeX entry displays:
V. G. Veselago, "The electrodynamics of substances with simultaneously
negative values of and ," Soviet Physics Uspekhi, vol. 10, no. 4, pp. 509-
514, 1968.
What is the supported/recommended method for inclusion of Greek characters in Zotero fields?
Thanks in advance.
From the example posted in the other thread, the exported BibTeX for the title field is now (with the updated translator):
title = {The electrodynamics of substances with simultaneously negative values of e and µ},
Rather than the '?' that appeared prior to the latest translator update. However, there are two problems:
1) Epsilon should be ε, not e, but more importantly...
2) As far as I know, LaTeX/BibTeX chokes on those characters; they should actually be $\epsilon$ and $\mu$, in order to be properly rendered in a bibliography. This is what I would have done if creating the BibTeX entry manually. As it is now, a PDF compiled referencing that BibTeX entry displays:
V. G. Veselago, "The electrodynamics of substances with simultaneously
negative values of and ," Soviet Physics Uspekhi, vol. 10, no. 4, pp. 509-
514, 1968.
What is the supported/recommended method for inclusion of Greek characters in Zotero fields?
Thanks in advance.
With an epsilon entered in the field, I get an epsilon (and not an 'e') when I export using the updated BibTeX translator.
Other folks can comment with more authority here, but some newer BibTeX processors support UTF-8 natively, which is why the BibTeX translator now supports UTF-8 output. You could override this by choosing ISO-8859-1 on export, but neither ε or µ are in the (very large but inevitably incomplete) mapping table. We can add those, but you'd probably be better off switching to an all-Unicode BibTeX workflow.
For what it's worth, "textmu" does map correctly, since it's technically a different Unicode character.
Thanks for the response. I think you've cleared up, for me, the current status of things related to this topic. In that case, the rest of this comment is probably going to end up becoming a feature request. The problem, for me and many other users, is more broad than a need for full Unicode support.
In the physical sciences, there are literally thousands of paper titles that contain either math mode (square root, etc.), Greek, or other special LaTeX characters (super- and sub-script for chemical formulae, etc.). It seems variations of this dilemma exist in other fields as well, given all the threads I've seen about bold, italics, small caps, etc., being desired.
I believe the feature that would solve all the problems -- for LaTeX users in the physical sciences -- looks kind of like this - but this is just a rough draft of the idea: A visible-only-when-desired "Title" field, perhaps called "LaTeX Title," that is completely un-escaped, so any users of LaTeX/BibTeX could copy the existing Title field, and modify it appropriately with the desired LaTeX control strings. The field would only override the default Title field when it is populated, and then, only for BibTeX export. This way, fancy "things" (leaving that term open in the broadest sense) could be easily input by those who know how and want them, and the feature would not obtrude in the workflows of anyone else.
The advantage of this for Zotero and BibTeX translator developers is that it does not require implementation of any translation features or mapping tables - the "LaTeX Title" field contents would be used as entered by the Zotero user (verbatim), and any problems in the field would be their own responsibility. It also allows much more fine-grained control than a static or sometimes-upgraded mapping table.
I can provide dozens of examples in my own small library of a few hundred references where this would be useful, but I believe the point emphasized here is valid without examples. Please reply with any comments. I would not at all mind hearing why this is a horrible idea, and why something else would work better. Comments from anyone who has successfully circumvented this issue are extremely welcome.
The proper solution, in my view, is to use unicode wherever possible, and maybe to add MathML support for those cases where it's not*.
*I once had a discussion about this with LaTeX and MathML expert David Carlisle, and his position was that unicode was enough to cover title needs, and that MathML was only really needed for things like abstracts.
Also, perhaps Unicode support is not "enough to cover title needs"... as an example, I would like to see this title rendered properly in the Zotero title field:
http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=786664
Unicode, as far as I know, does not have a subscript 'period' (see here). Maybe a Unicode expert can correct me on this. MathML could accommodate this, of course, but it's not math (right tool for the job, etc.).
MathML for math purposes (i.e., a superscripted radical, etc.) would be great, and I agree, all-inclusive. But is it the right tool for the job in non-math cases?
The point was to make life easier for everyone, instead of adding another layer of complexity. All that is being asked is a way to input, directly or indirectly, something as simple as $_{0.5}$ or {\it W}, so that editing a .bib file manually is not necessary after every export.
The specific implementation does not really matter, honestly. Anything that would allow editing the Zotero db exactly one time, and have a working export after that, would be perfect. If the issues mentioned (Greek, math, simple subscripts) are currently being addressed by any one solution, or multiple solutions, that's great. Thousands of scientists are being alienated by this simple missing functionality, however, and something has to be done about it. A discussion of solutions would be helpful, and then perhaps the developers can make a decision about which path to take.
The partial differential equation $u_t + uu_x + \mu u_{xx}$
This should have a subscript t under the first u, a subscript x under the second u, a Greek "mu" before the third term, and a subscript xx under the first u.
This comes out in BibTeX as
@article{hopf_partial_1950,
title = {The partial differential equation \$u\_t + uu\_x + {\textbackslash}mu u\_{xx}\$},
volume = {3},
journal = {Communications on Pure and Applied Mathematics},
author = {E. Hopf},
year = {1950},
pages = {201--230}
},
Which is most certainly NOT what I want. If those $ and \ characters were not escaped, this problem would be fixed.
Why can't Zotero just allow me to say I don't want to escape those characters? As it is, every time I want to regenerate my references from Zotero I have to manually change the file. That really sucks.
First, are all of the mangling issues you are encountering confined to spans enclosed in $...$ or {...}?
Second, are there commonly available pretty-printing utilities for BibTeX files that can reformat output that is syntactically correct, or close to it (back in the day, I remember there was something called "bibclean" by Nelson Beebe)?
Frank Bennett
Thanks for your quick response!
I'll address your first question since I'm not aware of anything for the second (I can do some research on the second issue if no one else comes up with anything). Personally, I think almost all of my issues with mangling could easily be solved if there was a way for Zotero to avoid escaping fields. Barring this, if it could at least avoid escaping anything between $...$, that would give me access to all LaTeX mathmode commands such as Greek letters, underscores, etc. I could of course see a relatively rare use case where some user had a title such as "On the $11.25 Trillion USD American Debt And What the $36,755.27 USD Per Capita Means." But it seems that escaping these dollar signs manually for BibTeX might be a more reasonable approach than always escaping everything, i.e. it would be easy to write this title as "On the \$11.25 Trillion USD American Debt And What the \$36,755.27 USD Per Capita Means." The relative inconvenience for solving this one rare use case is definitely worth it IMO because LaTeX users will very, very often want to do things like $e^{\mu xt}$ (superscript Greek mu xt over the letter e) or $u_{xx}$ (subscript xx under u).
So I see a couple of potential solutions to the issue:
1) Remove escaping from the BibTeX translator entirely. Whatever data goes into a field is what the .bib file ends up seeing.
Pros:
- this solves the escaping problem cleanly
- it is good design to allow users to pass legitimate data into a program rather than being punished for the program trying to clean up bad data. The "garbage in, garbage out" principle can be applied.
- LaTeX users are very used to manually escaping special characters and not having something escape those characters for them. Thus, this solution is functionality of "least surprise" (another good design principle)
Cons:
- Would backwards compatibility be harmed? Perhaps, but I don't think it would be too difficult to ask BibTeX users to escape their own bibliographies, especially since so many are no doubt unhappy with the current solution
- Is this solution LaTeX specific? Without knowing more of the internals of Zotero it is difficult for me to say
2) Add a button to disable escaping
Pros:
- This provides all the benefits described above
- Since the button could be disabled by default, this maintains backwards compatibility
Cons:
- Is this solution still too LaTeX specific?
- Is there real estate for such a button?
- Does such a button fit well within the overall Zotero GUI framework?
3) Come up with a Zotero language-neutral escaping format.
Pros:
- Maintain compatibility between different formats
Cons:
- Require manual escaping of data
- Requires definition of escaping language mechanisms
- Requires updating all translator parsers
- Adds burden on the user to learn a new mini language
I like solutions one or two.
Best wishes,
Lgstarn
Thanks for this info. It's been a long time since I used LaTeX (although once upon a time I knew TeX reasonably well), and I'm not in the sciences, so I was never that familiar with math mode.
What I'm thinking is that the CSL processor I'm working on (which I'm hopeful will find its way into Zotero eventually) might have enough functionality to render a bibliography in BibTeX format. You could then just pick up the output and use it with a LaTeX document. The keys would be an issue, but (possibly -- the engine is still in development, its functionality is in flux, and it is not yet in any sense part of Zotero) there might be a means of accomplishing targeted escaping of these spans. (The normal use of dollar signs would not require escaping; the "closing" sign is preceded by a space, that would be enough to avoid it.)
Let's play this by ear for awhile. The solution I have in mind is pretty much a daydream at this point, so I won't go into details; and there may be a simple and straightforward way of just fixing the BibTeX exporter so that it works better for your use cases. But I'll keep this item in mind as I chip away at the new processor.
The goal you want (rich-text titles) is also not specific to TeX. So why require a TeX-only solution? Instead, when rich-text is supported in titles, BibTeX-export should just keep pace. I wouldn't be too opposed to a workaround of wrapping sections in <math>, or similar. I don't think that Zotero should allow TeX escape characters to be used as-is: they're far too common in titles. What is wrong with UTF-8 text entry for greek letters? Yes, of course it is LaTeX specific: titles would be rendered differently on BibTeX export vs all other forms of export/citation. This wouldn't break other export, but it would encourage people to use a temporary solution (LaTeX escaping) in preference to a permanent one (rich text+UTF8 that worked everywhere). The option could be hidden fairly well (just as character encoding is hidden now). Manual-escaping of data isn't a con that is unique to this, though. The 2nd and 3rd cons aren't huge cons, as rich text titles are already a desired feature & this would need to be done anyway. The 4th con is also false; a WYSIWYG GUI could be used (as-is used in notes).
I think there is a basic misunderstanding of non-LaTeX users about the nature of LaTeX. Not trying to be a LaTeX snob here or anything but please look at the following PDF:
http://www.maths.manchester.ac.uk/~kd/latextut/slidex.pdf
Take a look at the text in the center of the page. This can be created in LaTeX with the following text:
$ {\cal L}_{_X} \nabla=ev\vert_{_{t=0}}\circ\, \displaystyle{\frac{\partial}{\partial t}}\circ\, \nabla^{^{\varphi_{{}_t}}}, $
Now, I would venture to wager that this example can NOT be done using a combination of rich text and/or Unicode. If I am wrong, please prove it to me.
You see, LaTeX is a formatting language for math and science that was written in 1980s. Since then, pretty much all mathematicians, scientists and engineers have been depending on it to typeset such text. It has thus grown to be a monster that is capable of doing what no other typesetting language can. For example, see the comprehensive LaTeX symbol guide:
www.ctan.org/tex-archive/info/symbols/comprehensive/symbols-a4.pdf
Zotero cannot and should not try to mimic everything that LaTeX can do for math typesetting. It would break the bank in complexity money, so to speak. LaTeX is NOT a word processor. It is a typesetting language. The typesetting commands simply need to be passed into LaTeX - that's all there is to it.
Let me reiterate. The solution I want is the simply the ability to get the text $u_t + uu_x + \mu u_{xx}$ into BibTeX some how, some way without having it get "helpfully" destroyed by Zotero as \$u\_t + uu\_x + {\textbackslash}mu u\_{xx}\$}. In other words, right now Zotero's BibTeX formatter mangles ALL LaTeX characters. The solution I've proposed is just a way to get around LaTeX escaping, not add it in. Right now users of LaTeX are being punished by stupid escaping that only baby users who don't understand LaTeX would want. I would be willing to wager that 99% of users of BibTeX will prefer the non-mangled version if they understand the distinction, which is this:
1) You can have LaTeX formatted titles to put in Greek letters, subscripts, superscripts, math caligraphy style, etc. in your titles. If you have special character like $, %, \, _ in your title, you'll have to escape it.
2) You can never have any LaTeX mathmode formatted titles from Zotero unless you want to manually edit your file each time you regenerate your BibTeX file. To compensate for this huge inconvenience, however, Zotero will automatically escape all of your special characters like $, %, \, _ for you. (Gee, thanks Zotero!!!)
To be blunt, this "feature" is really almost a deal breaker for me for Zotero at the moment. I love everything else about Zotero except this one huge annoyance. It isn't just that Zotero is not a LaTeX-specific reference manager as you say, it is that at the moment Zotero is not a LaTeX reference manager, period.
Thanks for your calm response to my somewhat hot-headed one. I apologize for what I said above.
Well, it seems that for the most part we are on the same page. Let's go back to my original use case and brainstorm a clever way to get the following title:
The partial differential equation ut + uux + μuxx
Into BibTeX as
@article{hopf_partial_1950,
title = {The partial differential equation $u_t + uu_x + \mu u_{xx}$},
volume = {3},
journal = {Communications on Pure and Applied Mathematics},
author = {E. Hopf},
year = {1950},
pages = {201--230}
},
Ideas?
PS Of course we can agree that the above HTML cludge:
u<sub style="font-size:xx-small; vertical-align:bottom;" >t</sub> + uu<sub style="font-size:xx-small; vertical-align:bottom;" >x</sub> + μu<sub style="font-size:xx-small; vertical-align:bottom;" >xx</sub>"
is not the solution. :-)
A better short-term solution would be built that acknowledged where we were headed. One such semi-palatable solution would be to surround parts of a title you didn't want escaped with <bibtex> tags, or similar which would never have another meaning in a title (so, at least, the set of hacked entries would be kept separate & could be fixed in the future).
The long-term solution is that the title editor will support rich text, just as the notes editor does. There is a ticket for this:
https://www.zotero.org/trac/ticket/439
They'd be stored as (x)html & the BibTeX exporter would have to be clever enough to replace
<sub>
with_{
and</sub>
with}
. This is trivial & other reference managers do it. Yes, the style tags should not be needed for Zotero; I think someone has already reported that sub/sup is broken in the CSS used in these forums. However, using HTML (accessible with a GUI palate) & converting to LaTeX seems like it isn't a bad idea for the limited amount of rich text we probably need to support.I'll continue to post on this thread in the hopes of improving Zotero, but for the moment Zotero has lost me due to this issue. No big loss for Zotero I understand. :-)
"a manual modification to the BibTeX translator to remove transliteration of escape characters isn't a HORRIBLE idea."
I could probably download the source and give this a swing. It would probably take just about as long as switching over to JabREF. Would anyone else be interested in this code, i.e. after I have done my contribution where should it go?
https://www.zotero.org/trac/browser/extension/trunk/translators/BibTeX.js
e.g.:
Remove the \$ and \_ and \\ from line 1806
Remove textbackslash line in the alwaysmap array
Again: this is a dirty hack to give you what you want now, but it isn't a good solution going forward. I'd tag records that you force TeX underscores on, so you could fix them later. For entries that have greek characters & only sub/super-ed arabic numerals, you should use the UTF-8 entities & they should work both in and out of BibTeX.
For posterity, I'll leave explicit instructions if anyone else wishes to implement this dirty hack in his or her own local copy without even messing with SVN.
1) Find your profile. Here's how if you need it: http://support.mozilla.com/en-us/kb/Profiles
2) Go under the zotero\translators and edit the BibTeX.js file
3) Change line the "alwaysMap" array (line 1528) from the following:
var alwaysMap = {
"|":"{\\textbar}",
"<":"{\\textless}",
">":"{\\textgreater}",
"~":"{\\textasciitilde}",
"^":"{\\textasciicircum}",
"\\":"{\\textbackslash}"
};
to the following
var alwaysMap = {
"|":"{\\textbar}",
"<":"{\\textless}",
">":"{\\textgreater}" //,
// "~":"{\\textasciitilde}",
// "^":"{\\textasciicircum}"
// "\\":"{\\textbackslash}"
};
This will remove escaping ~, ^ and \.
Now change the following on line 1806:
value = value.replace(/[|\<\>\~\^\\]/g, mapEscape).replace(/([\#\$\%\&\_])/g, "\\$1");
to the following:
value = value.replace(/[|\<\>]/g, mapEscape).replace(/([\#\%\&])/g, "\\$1");
4) Save the file. You won't even need to restart Firefox!
Awesome. I'm a happy tomato. Sorry for the trouble and thanks again for the help!
- in math papers, sometimes you need to have things which look like regular text to be typeset in math mode, e.g. "$K$-theory", not "K-theory" (so pure rich text support is definitely not sufficient - an extra pair of tags saying <math> would be sufficient though)
- there's another TeX quirk (which I agree might not be that critical to have as soon as possible, so this is just for the record): sometimes it uses macros (=functions) to capitalize or de-capitalize article titles, and it is important that some letters need to be kept capital regardless of the context (for instance in "book name, Vol I.", 'V' and 'I' need to be capital). The usual way to force bibtex to keep these capital is to enclose them with {}, e.g. "book name, {V}ol {I}."
Anyways, the mantra "zotero isn't a bibtex manager", while true, does not justify what you're implying, namely that people trying to use zotero with bibtex should shut up and wait for an extra feature which is actually much more complicated than what they need. The least someone knowledgeable as you could do is to tell us how to implement the hack, so that we can actually *use zotero to do what it is supposed to do* (namely "..help you ... cite your research sources" [the main zotero website]).
But I (a BibTeX user) maintain that it doesn't really make sense to rely on this hack in the core Zotero.
The only thing that gets messed up right now is the {} for capitalization (import .bib -> export .bib does not preserve that, but I guess I can't have everything :)
I'm one of those scientist who has to use all those Latex features (sub- und superscripts, math, ...) in Zoteros title fields and export them to Bibtex. As others I am extremely unhappy with Zoteros escaping policy.
Has a solution emerged in the meantime (I couldn't find one so far) or will I have to implemented above meantioned hack?
Has anybody ever thought about regularly "forking" BibTex.js, removing the escaping funcionality and redistributing it as BibTex_noEscaping.js?
Would that work? How frequently would one have to build a new file?