Koha translator encoding problem?
I was demonstrating Zotero's fantastic features to my students yesterday, and I imported something from our library in Vietnamese. We use Koha. The title came through in jumbled characters. I assume it is an encoding issue.
To isolate the problem I looked up an item in the Yale library using yufind. You can see it at http://yufind.library.yale.edu/yufind/Record/6244910. The vufind translator preserves encoding just fine. The title comes through as: "Miền nam Việt Nam : đất nước, con người".
Then I created a record in our library using text copied from the Yale record. You can see it at http://koha.theology-vietnam.org/cgi-bin/koha/opac-detail.pl?biblionumber=625&query_desc=kw,wrdl: Miền Nam. Or, if you prefer, go to koha.theology-vietnam.org and search for "Miền nam". The title comes through as "MieÌ‚Ì€n nam Việt Nam: Ä‘aÌ‚Ìt nuÌ›oÌ›Ìc, con nguÌ›ời". It is that way for all titles, but it was not that way a few days ago, if I am not mistaken.
I looked at two other libraries using Koha (one in France and one in Taiwan), and the characters came over just fine. That makes me wonder if it has something to do with my library's settings. But then I look at the page source, and nothing seems amiss (even the character encoding is correct, utf-8). Does anyone have any ideas about this one?
To isolate the problem I looked up an item in the Yale library using yufind. You can see it at http://yufind.library.yale.edu/yufind/Record/6244910. The vufind translator preserves encoding just fine. The title comes through as: "Miền nam Việt Nam : đất nước, con người".
Then I created a record in our library using text copied from the Yale record. You can see it at http://koha.theology-vietnam.org/cgi-bin/koha/opac-detail.pl?biblionumber=625&query_desc=kw,wrdl: Miền Nam. Or, if you prefer, go to koha.theology-vietnam.org and search for "Miền nam". The title comes through as "MieÌ‚Ì€n nam Việt Nam: Ä‘aÌ‚Ìt nuÌ›oÌ›Ìc, con nguÌ›ời". It is that way for all titles, but it was not that way a few days ago, if I am not mistaken.
I looked at two other libraries using Koha (one in France and one in Taiwan), and the characters came over just fine. That makes me wonder if it has something to do with my library's settings. But then I look at the page source, and nothing seems amiss (even the character encoding is correct, utf-8). Does anyone have any ideas about this one?
koha.theology-vietnam.org/cgi-bin/koha/opac-export.pl?format=utf8&op=export&bib=625&save=Go
in Firefox I see the wrongly encoded letters.
However, when look at it, it simply looks like non-normalized unicode: Miền nam Việt Nam. It is not the jumbled mess that comes into Zotero. Is there a step I am missing?
Experimenting with encodings in my text editor, it seems that Firefox is interpreting the file as "Western (ISO Latin 1)" encoding, rather than UTF8. When I try to save it as Western, my text editor BBEdit complains that the text is actually UTF8.
Just to confirm this with Firefox, I pasted the offending text in HTML page properly formatted and opened it in Firefox. The text displays correctly.
From what I can tell, the exported text is fine. But Firefox and Zotero are incorrectly interpreting the encoding of the MARC export as Western, not UTF8. I want to blame Firefox and Zotero, but other libraries do not have this problem.
Content-Type: application/octet-stream; charset=ISO-8859-1
Some other Koha catalogs that I looked at either don't include a character encoding in the header or specify UTF-8 (the latter would be the right thing to do)