Bug: Snapshot of page produces XML Parsing Error

Geoffrey Kantaris · July 20, 2011

When I save the following web page as a snapshot, either by "Creating new item from current page" or by "Attaching snapshot of current page", the resulting archived copy of the page is unreadable in Firefox:

http://www.newtonproject.sussex.ac.uk/view/texts/normalized/THEM00234

Viewing the snapshot produces the yellow screen of death with the message "XML Parsing Error: not well-formed". The problem seems to be that the original page is encoded in ISO-8859-1 (or Firefox thinks it is), but the snapshot becomes UTF-8 (again, according to Firefox's character encoding menu). Hence some non-permitted characters crop up in the file and confuse Firefox's XML parser. Come to think of it, it’s not at all clear to me why Zotero is storing the file as an XHT file instead of XHTML. Any clues?

adamsmith · July 20, 2011

Simon just fixed that on the trunk - will find it's way into the next Zotero version.

Simon · July 20, 2011

This only because a problem because the original page gets served as application/xhtml+xml, which makes Firefox very strict about parsing, and also causes Zotero not to use WebPageDump to save the page. As a result, you get a parse error and the CSS and images on the page don't get saved. This has been fixed for the next release.

Geoffrey Kantaris · July 20, 2011

Thank you very much indeed. This must be the fastest fix ever for a bug report!!