BibTeX, percent symbols in url's, and curly braces

I'm running into trouble with using bibtex on my exported bibliographic database. When running bibtex (from MikTeX and within WinEdt 6.0), compilation halts, with the error message:

("C:\Documents and Settings\les\My Documents\Writing\Thesis\thesis.bbl"
("C:\Documents and Settings\les\My Documents\Writing\Thesis\thesis.brf")
! Argument of \BR@@bibitem has an extra }.
<inserted text>
\par
l.7 Trust online}}
.
?

Looking at the related entry in the .bib file, I can see that WinEdt - and, by extension, bibtex - is unable to match up the braces which surround the url field, because of several percent symbols in the URL (e.g. title=Communications%20of%20the%20ACM). Everything between the first URL and the end of line is considered a comment - and of course, that includes the closing brace for the URL entry.

I can hand-edit and work around in two ways:

a) Escape the percent signs with a backslash, or
b) Remove the curly braces around the url value

Both are tedious, as there are lots of entries with percent symbols - most with multiple. And having to do this each time I export the library is going to be *really* painful, as I'm gradually converting to Zotero from my old home-grown database, hence frequent updates as I convert the keys in my thesis source.

Any suggestions?
  • This is very likely to be BibTeX-style dependent. The URL field is not escaped, because that breaks URLs in some styles. You should be able to either edit BibTeX.js or use a search/replace to use either work-around with very little work. Depending on the style, other alternatives might be to enclose the url in \url or \href or similar or to change the name of the field.
  • Thanks, noksagt - that's pointed me in a couple of directions I can follow.

    As you say, the effect is style-dependent, but only in the sense that it affects styles which use the URL field, and doesn't affect those that don't. So I can at least keep making progress by switching styles while I investigate. However, it affects the natbib package's plainnat style, which I should imagine is fairly widely used, so I'm surprised it hasn't come up before.

    I've found BibTeX.js, taken a look and it didn't scare me to death, so I'm going to hack on it for a while to see if I can come up with a fix.

    Thanks for the pointers.
  • OK, I'm making progress. I edited the writeField function in BibTeX.js, to allow escaping of the URL field, by changing the line

    if(!((field == "url") || (field == "doi") | (field == "file"))) {

    to

    if(!((field == "doi") | (field == "file"))) {

    so that the URL field drops through the regex replacement code that follows.

    After re-exporting the library, I can successfully PDFTexify the thesis with the natbib plainnat style. URL's appear in the generated bibliography and work correctly in the generated PDF.

    Looking at the generated thesis.bbl file, the previously-troublesome URL comes through as

    \url{{http://portal.acm.org/toc.cfm?id=355112\&coll=portal\&dl=ACM\&type=issue\&idx=J79\&part=magazine\&WantType=Magazines\&title=Communications\%20of\%20the\%20ACM\&CFID=1436618\&CFTOKEN=5260756}}.

    so that ampersands and percent symbols are being escaped, but the URL in the generated PDF is

    http://portal.acm.org/toc.cfm?id=355112&coll=portal&dl=ACM&type=issue&idx=J79&part=magazine&WantType=Magazines&title=Communications%20of%20the%20ACM&CFID=1436618&CFTOKEN=5260756

    As expected, pdflatex has correctly generated the escaped characters - just as I would have had to escape them in body text. I can't see why the included .bbl file would be any different.

    So now I'm questioning the notion that URL's shouldn't be escaped. I'm going to try this version of the exported .bib file with the other styles that were giving trouble. But are there other styles for which escaping *does* cause problems?
  • And . . no problems. The escaped URL's work with the other styles that use URL's - bbrvnat, unsrtnat as well as the PhDbiblio-url2 that is included in the template I'm working with.

    So this fixes my problem. I wonder if this fix affects anything or anybody else?
  • A deliberate decision was made to not escape urls. Most modern BibTeX styles use url.sty or other URL-aware packages that don't require escaping (and may actually break URLs). See, e.g. http://www.michaelshell.org/tex/ieeetran/bibtex/
  • I checked the PhD thesis template that I was using, and it, in turn, uses the hyperref package. However, even with hyperref in use, the unescaped URL export breaks natbib's plainnat style, as well as all the URL-using styles provided with the template.

    I tested the ieeetran.bst style in a test document, with and without hyperref, and with both escaped-url and non-escaped-url bib files. The escaped URL's always worked corectly, while the non-escaped URL's only worked correctly with the hyperref package. I got similar results with plainnat style.

    Which leaves me with the question: since I'm using hyperref, why is it not correctly fixing unescaped URL's? It may be down to one of the options the template uses with hyperref:

    \usepackage[ pdftex, plainpages = false, pdfpagelabels,
    pdfpagelayout = useoutlines,
    bookmarks,
    bookmarksopen = true,
    bookmarksnumbered = true,
    breaklinks = true,
    linktocpage,
    pagebackref,
    colorlinks = false, % was true
    linkcolor = blue,
    urlcolor = blue,
    citecolor = red,
    anchorcolor = green,
    hyperindex = true,
    hyperfigures
    ]{hyperref}

    So that's what I'll investigate next.

    Given that, on the test cases I've tried so far, escaping URL's has worked correctly in every case and caused no problems, while non-escaped URL's have broken several styles, I think I'm going to stick with my little mod for the foreseeable future, though.
  • OK, I've narrowed it down to the "pagebackref" option of the hyperref package. Set that true, and unescaped URL's will cause compilation to fail. With it false, everything is fine. I've also reproduced the same behaviour in a small test case using the IEEEtran style.

    So at this point, I'm reasonably certain that's where the problem lies. I've looked at hyperref.sty and quickly realised my LaTeX-fu isn't up to the task of fixing this, so I'll pursue it with the hyperref maintainers.

    At least I have a good handle on this behaviour and several workarounds. Thanks for nudging me in the right direction!
Sign In or Register to comment.