URL-encoding in COinS

noksagt · March 4, 2008

From:
https://sourceforge.net/forum/message.php?msg_id=4815286

Three records on the page:
http://tom.vercauteren.googlepages.com/test
cause import to fail. These records contain the author name 'Cavé' encoded as
'Cav%E9'. I don't have time to immediately test other encoded characters or to see why this one fails.

AFAIK, COinS entities SHOULD be URL encoded, as some tools merely append them to an OpenURL resolver & don't reformat them in anyway. In any case, Zotero should not fail when it encounters them.

If someone can confirm this on other pages, please reply with a ticket number or with the failing page & I'll at least make a ticket & may look into this more closely.

Thanks!

Jeremy · October 28, 2008

I'm having a similar problem using the COinS plugin on Omeka. We're getting a Zotero translator error, and we suspect that the presence of the string %27 in the COinS title attribute is causing the error. When %27 is removed, the error doesn't occur.

After further testing, we find that the error occurs when the string %27 is present in the subject field in COinS span, which we think maps to tags for an item in Zotero. There's no error when %27 is present in the title of an item, or description.

dstillman · October 28, 2008

The problem on the original test page above appears to be that the characters are being encoded as Latin-1 characters rather than UTF-8 characters. JavaScript's encodeURIComponent() and decodeURIComponent(), which we use, are UTF-8-compliant, so "é" is encoded as "%C3%A9". The COinS guide itself gives "é"=>"%C3%A9" as an example, so Zotero appears to be behaving correctly here.

Jeremy, do you have a test page where I can reproduce your error? decodeURIComponent("%27") should work fine. Are you getting an error in Report Errors?

noksagt · May 14, 2009

refbase has been modified to emit UTF-8 COinS. However, I wouldn't consider the example given in the COinS guide to be a mandate that no other escaping system should be used. Pragmatically, I know that refbase wasn't the only application that used the old escape/unescape functions. I've reopened ticket 669 with a suggested patch.

ajlyon · June 7, 2010

Is this still a live issue? I feel like the patch could land, but if there haven't been any non-UTF8 COinS issues since 2008 or so, then I understand the lack of movement.

In any case, if this could be resolved one way or the other, we'd have one less ticket in the queue.

noksagt · June 7, 2010

I don't know how pervasive this issue is. It shouldn't be pervasive for refbase: unAPI+MODS is enabled by default & has a greater priority than COinS. But it might be present elsewhere, and I'd say the patch is a simple case of Postel's robustness principle. LibX ships with similar detection code. Not high priority, but I don't know of a reason not to make this land.