"malformed URI sequence" at capturing a web page
On capturing some web pages (such as ),
I got an error as follows:
Error: malformed URI sequence
Source file: chrome://zotero/content/xpcom/attachments.js
Line: 1169
(Submitted as report ID 878181380)
Line 1169 at chrome://zotero/content/xpcom/attachments.js says:
function _getFileNameFromURL(url, mimeType){
/* ... */
// Pass unencoded name to getValidFileName() so that '%20' isn't stripped to '20'
nsIURL.fileBaseName = Zotero.File.getValidFileName(decodeURIComponent(nsIURL.fileBaseName)); // <-- HERE
I think this problem comes from the fact "%A4%CF%A4%C6%A4%CA" is an encoded string of UTF-8.
(Actually, "%A4%CF%A4%C6%A4%CA" is an encoded string of EUC-JP.)
Perhaps we need some more works in getting file name in saving from URI.
Thanks in advance.
I got an error as follows:
Error: malformed URI sequence
Source file: chrome://zotero/content/xpcom/attachments.js
Line: 1169
(Submitted as report ID 878181380)
Line 1169 at chrome://zotero/content/xpcom/attachments.js says:
function _getFileNameFromURL(url, mimeType){
/* ... */
// Pass unencoded name to getValidFileName() so that '%20' isn't stripped to '20'
nsIURL.fileBaseName = Zotero.File.getValidFileName(decodeURIComponent(nsIURL.fileBaseName)); // <-- HERE
I think this problem comes from the fact "%A4%CF%A4%C6%A4%CA" is an encoded string of UTF-8.
(Actually, "%A4%CF%A4%C6%A4%CA" is an encoded string of EUC-JP.)
Perhaps we need some more works in getting file name in saving from URI.
Thanks in advance.
This certainly due to EUC-JP, though, not a general UTF-8 issue.
But some concerns at getting a file name by decoding the URI's final component.
- what if we got a error while decoding the URI's final component
- what if decoded URI's final component is not appropriate for a file name
(something like accented alphabets at Japanese locale in Windows,
or something like CJKV chars at European locale in Windows)
[Edit: I know that there are people using operating systems that don't qualify as "modern" by the definition, but even XP is pretty much OK with Unicode filenames in most cases.]
There is a problem to fix here, but it's that we're failing to account for the non-UTF-8 content in URIs. There is probably somewhat related to an issue that we have with non-UTF-8 COinS (which are also URL-encoded). In both cases, it's hard to know what character encoding is in use (since URIs don't have any way to mark it explicitly, of course).
> so long as it doesn't include any explicitly reserved characters
> (and I think Mozilla will handle those automatically).
I see.
(Some times ago, I've got an error with accented alphabets in file name,
but that was not with Mozilla product, so this is not the problem here.)
I want some fixes here.
- At least, if failed to save a file, prompt me about that.
(Current Zotero finished his work **quietly** even if failed to save a file.)
- If possible, when failed to save a file with a certain name, try some altrenate name.
Maybe alternate name can be a SHA-1 hash of path of URI,
or even the raw final component of URI, I think.
In general, there a few cases where we could use more user notification-- a similar case is when PDF attachments fail to be attached, and people would often want to know that, especially if they're used to saves always succeeding.
I've made a quick (and dirty) patch for this <URL:https://gist.github.com/938927>.
Posted to zotero-dev but need some time for an approval.