Bug: Snapshot of page produces XML Parsing Error

When I save the following web page as a snapshot, either by "Creating new item from current page" or by "Attaching snapshot of current page", the resulting archived copy of the page is unreadable in Firefox:

http://www.newtonproject.sussex.ac.uk/view/texts/normalized/THEM00234

Viewing the snapshot produces the yellow screen of death with the message "XML Parsing Error: not well-formed". The problem seems to be that the original page is encoded in ISO-8859-1 (or Firefox thinks it is), but the snapshot becomes UTF-8 (again, according to Firefox's character encoding menu). Hence some non-permitted characters crop up in the file and confuse Firefox's XML parser. Come to think of it, it’s not at all clear to me why Zotero is storing the file as an XHT file instead of XHTML. Any clues?
  • Simon just fixed that on the trunk - will find it's way into the next Zotero version.
  • edited July 20, 2011
    This only because a problem because the original page gets served as application/xhtml+xml, which makes Firefox very strict about parsing, and also causes Zotero not to use WebPageDump to save the page. As a result, you get a parse error and the CSS and images on the page don't get saved. This has been fixed for the next release.
  • Thank you very much indeed. This must be the fastest fix ever for a bug report!!
Sign In or Register to comment.