Validated styles suddenly deemed 'not valid'

marionsd · February 15, 2012

I created two custom styles for our institute in CSL 1.0, which I'd carefully validated on Validator.nu and had successfully used as recently as yesterday.

Today, after a Firefox update (but I didn't think also a Zotero one), I went to continue editing a document with one of these styles, and Zotero gave me a pop-up window telling me the style appears not to be valid. Ditto with my other custom style.

The built-in Chicago style worked fine.

I just went back and re-validated the style on Validator.nu. Still good. Any idea what's wrong and, more important, how to fix it?

marionsd · February 15, 2012

The only way Validator.nu shows an error, by the way, is if I uncheck the "be lax about HTML-Content-Type" box. Then I get this:

IO Error: Non-XML Content-Type: text/plain.

adamsmith · February 15, 2012

Have you tried re-installing the styles?
Tried in a new document?
Validation sounds fine - the error is expected when you uncheck the "be lax..." box.

marionsd · February 15, 2012

Did try re-installing, and just tried a new document after you suggested it. Nope, still bad.

Has there been any change to the Zotero code in the last 24 hours that could've triggered this?

Rintze · February 15, 2012

If you share your style (e.g. via gist.github.com, as described at https://github.com/citation-style-language/styles/wiki/Submitting-Styles ), we can take a look.

adamsmith · February 15, 2012

potentially - there was a 3.0.2 update yesterday, though this seems like an unlikley consequence of an update.
Could you post the style online as a public gist at gist.github.com and provide a link here (or make it accessible some other way - don't paste it here, though).

marionsd · February 15, 2012

Here you go:
git://gist.github.com/1839131.git

Rintze · February 15, 2012

I can reproduce this with the trimmed down style https://gist.github.com/1839241 . It gives an "Error parsing style:
TypeError: child.getAttribute is not a function" error in the Zotero Reference Test pane. Strangely, the error goes away when I put a line break between "</choose></group>".

marionsd · February 15, 2012

Odd; not sure what that means. But I did go and make sure there was a line break before every </group> and that didn't make my style valid, per Zotero.

adamsmith · February 15, 2012

I replicated this, too, same error. On the other hand, the style works fine after I open it with emacs and do nothing but pretty-print it.
Here's a copy that should work
https://gist.github.com/1839326

That still means there is some odd bug, but this should get you working again.

Rintze · February 15, 2012

Also, http://www.shell-tools.net/index.php?op=xml_format reports:

/var/tmp/FOOZXQbv9:18: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0x91 0x3C 0x2F 0x74
<term name="open-quote">‘</term>
^

So it might be an encoding issue.

marionsd · February 15, 2012

Yay! It works. Thank you! What did you do, so I can replicate it with my other two (also now-rejected) styles?

adamsmith · February 15, 2012

I just opened them with my text-editor (emacs), used and used an xml pretty print function that does nothing except fix indenting. I have no idea why that worked, but it takes me 30secs each, so I'm happy to just do that for you if you upload the two other styles.

marionsd · February 15, 2012

I think I'm going to download emacs and try it myself. If I can't figure it out quickly enough, I'll be back in the morning.

Thanks again, so much!

Simon · February 15, 2012

We switched the way citeproc-js parses XML in 3.0.2 from using E4X (since Firefox's E4X support is increasingly buggy and no longer actively maintained) to DOM XML. It looks like this is a bug in citeproc-js's DOM XML support.

fbennett · February 15, 2012

So it might be an encoding issue.

@rintze: This looks like a red herring. It's complaining of a string "ESC</t", but the escape is not in the input. Looking at the characters in a text editor, the open single quote is properly formed UTF-8, and converts correctly to Unicode in Python.

fbennett · February 15, 2012

Okay, I've identified the source of the problem, and will check in a fix soon, after final testing. A few words on the technical joys of this issue ...

Simon's intuition of a bug in the citeproc-js DOM parsing module was right, and Rintze's trial with and without the newline was the key that pinpointed the fault. Apparently the Firefox DOM parser places an empty Text node between the closing tags if (and only if) they are on the same line. A function in the citeproc-js DOM parsing module relies on counting the number of child nodes. It was assuming that all children it saw were meaningful DOM nodes (and not Text). It crashed because when nodes of type Text are in there, and don't have a getAttribute() method.

The fix I've applied is a little wordy in the code. The simpler approach would seem to be just to use node.normalize() to eliminate empty Text nodes ... but that didn't work. The presence of adjacent closing tags apparently wakes up some general Text node recognition machinery, so we get nodes with hard returns in them as well (which are not empty, and so are not dropped by normalize(). (Into the bargain, the fix applied will cut out Comment nodes in the target, which would also have broken things.)

With version 1.0.282 (just checked in), things should be handled correctly in this function regardless of line breaking in the CSL source.

Simon · February 16, 2012

fbennett: you could use an XPath to find empty text nodes (//text()[normalize-space(.)=""], I think) and remove them before inspecting the document. This won't work in IE, but if I remember correctly, IE's parser ignores whitespace anyway. I'm not sure if this is a more desirable solution than the above, but it's an option.