BibTeX, percent symbols in url's, and curly braces
I'm running into trouble with using bibtex on my exported bibliographic database. When running bibtex (from MikTeX and within WinEdt 6.0), compilation halts, with the error message:
("C:\Documents and Settings\les\My Documents\Writing\Thesis\thesis.bbl"
("C:\Documents and Settings\les\My Documents\Writing\Thesis\thesis.brf")
! Argument of \BR@@bibitem has an extra }.
<inserted text>
\par
l.7 Trust online}}
.
?
Looking at the related entry in the .bib file, I can see that WinEdt - and, by extension, bibtex - is unable to match up the braces which surround the url field, because of several percent symbols in the URL (e.g. title=Communications%20of%20the%20ACM). Everything between the first URL and the end of line is considered a comment - and of course, that includes the closing brace for the URL entry.
I can hand-edit and work around in two ways:
a) Escape the percent signs with a backslash, or
b) Remove the curly braces around the url value
Both are tedious, as there are lots of entries with percent symbols - most with multiple. And having to do this each time I export the library is going to be *really* painful, as I'm gradually converting to Zotero from my old home-grown database, hence frequent updates as I convert the keys in my thesis source.
Any suggestions?
("C:\Documents and Settings\les\My Documents\Writing\Thesis\thesis.bbl"
("C:\Documents and Settings\les\My Documents\Writing\Thesis\thesis.brf")
! Argument of \BR@@bibitem has an extra }.
<inserted text>
\par
l.7 Trust online}}
.
?
Looking at the related entry in the .bib file, I can see that WinEdt - and, by extension, bibtex - is unable to match up the braces which surround the url field, because of several percent symbols in the URL (e.g. title=Communications%20of%20the%20ACM). Everything between the first URL and the end of line is considered a comment - and of course, that includes the closing brace for the URL entry.
I can hand-edit and work around in two ways:
a) Escape the percent signs with a backslash, or
b) Remove the curly braces around the url value
Both are tedious, as there are lots of entries with percent symbols - most with multiple. And having to do this each time I export the library is going to be *really* painful, as I'm gradually converting to Zotero from my old home-grown database, hence frequent updates as I convert the keys in my thesis source.
Any suggestions?
As you say, the effect is style-dependent, but only in the sense that it affects styles which use the URL field, and doesn't affect those that don't. So I can at least keep making progress by switching styles while I investigate. However, it affects the natbib package's plainnat style, which I should imagine is fairly widely used, so I'm surprised it hasn't come up before.
I've found BibTeX.js, taken a look and it didn't scare me to death, so I'm going to hack on it for a while to see if I can come up with a fix.
Thanks for the pointers.
if(!((field == "url") || (field == "doi") | (field == "file"))) {
to
if(!((field == "doi") | (field == "file"))) {
so that the URL field drops through the regex replacement code that follows.
After re-exporting the library, I can successfully PDFTexify the thesis with the natbib plainnat style. URL's appear in the generated bibliography and work correctly in the generated PDF.
Looking at the generated thesis.bbl file, the previously-troublesome URL comes through as
\url{{http://portal.acm.org/toc.cfm?id=355112\&coll=portal\&dl=ACM\&type=issue\&idx=J79\&part=magazine\&WantType=Magazines\&title=Communications\%20of\%20the\%20ACM\&CFID=1436618\&CFTOKEN=5260756}}.
so that ampersands and percent symbols are being escaped, but the URL in the generated PDF is
http://portal.acm.org/toc.cfm?id=355112&coll=portal&dl=ACM&type=issue&idx=J79&part=magazine&WantType=Magazines&title=Communications%20of%20the%20ACM&CFID=1436618&CFTOKEN=5260756
As expected, pdflatex has correctly generated the escaped characters - just as I would have had to escape them in body text. I can't see why the included .bbl file would be any different.
So now I'm questioning the notion that URL's shouldn't be escaped. I'm going to try this version of the exported .bib file with the other styles that were giving trouble. But are there other styles for which escaping *does* cause problems?
So this fixes my problem. I wonder if this fix affects anything or anybody else?
I tested the ieeetran.bst style in a test document, with and without hyperref, and with both escaped-url and non-escaped-url bib files. The escaped URL's always worked corectly, while the non-escaped URL's only worked correctly with the hyperref package. I got similar results with plainnat style.
Which leaves me with the question: since I'm using hyperref, why is it not correctly fixing unescaped URL's? It may be down to one of the options the template uses with hyperref:
\usepackage[ pdftex, plainpages = false, pdfpagelabels,
pdfpagelayout = useoutlines,
bookmarks,
bookmarksopen = true,
bookmarksnumbered = true,
breaklinks = true,
linktocpage,
pagebackref,
colorlinks = false, % was true
linkcolor = blue,
urlcolor = blue,
citecolor = red,
anchorcolor = green,
hyperindex = true,
hyperfigures
]{hyperref}
So that's what I'll investigate next.
Given that, on the test cases I've tried so far, escaping URL's has worked correctly in every case and caused no problems, while non-escaped URL's have broken several styles, I think I'm going to stick with my little mod for the foreseeable future, though.
So at this point, I'm reasonably certain that's where the problem lies. I've looked at hyperref.sty and quickly realised my LaTeX-fu isn't up to the task of fixing this, so I'll pursue it with the hyperref maintainers.
At least I have a good handle on this behaviour and several workarounds. Thanks for nudging me in the right direction!