Koha translator encoding problem?

I was demonstrating Zotero's fantastic features to my students yesterday, and I imported something from our library in Vietnamese. We use Koha. The title came through in jumbled characters. I assume it is an encoding issue.

To isolate the problem I looked up an item in the Yale library using yufind. You can see it at http://yufind.library.yale.edu/yufind/Record/6244910. The vufind translator preserves encoding just fine. The title comes through as: "Miền nam Việt Nam : đất nước, con người".

Then I created a record in our library using text copied from the Yale record. You can see it at http://koha.theology-vietnam.org/cgi-bin/koha/opac-detail.pl?biblionumber=625&query_desc=kw,wrdl: Miền Nam. Or, if you prefer, go to koha.theology-vietnam.org and search for "Miền nam". The title comes through as "MieÌ‚Ì€n nam Việt Nam: Ä‘ất nuÌ›ớc, con nguÌ›ời". It is that way for all titles, but it was not that way a few days ago, if I am not mistaken.

I looked at two other libraries using Koha (one in France and one in Taiwan), and the characters came over just fine. That makes me wonder if it has something to do with my library's settings. But then I look at the page source, and nothing seems amiss (even the character encoding is correct, utf-8). Does anyone have any ideas about this one?
  • I couldn't spot anything obvious, but note that the Koha translator is not getting the info from the page, but from the MARC export and it looks like that may have a problem. At leas when I open the downloaded file
    koha.theology-vietnam.org/cgi-bin/koha/opac-export.pl?format=utf8&op=export&bib=625&save=Go
    in Firefox I see the wrongly encoded letters.
  • Thanks for looking at this. It is good to know that the MARC export is what Zotero uses.

    However, when look at it, it simply looks like non-normalized unicode: Miền nam Việt Nam. It is not the jumbled mess that comes into Zotero. Is there a step I am missing?
  • save it as a text file and open it in Firefox to see what Zotero is getting.
  • Oh, that's weird. I see what you mean.

    Experimenting with encodings in my text editor, it seems that Firefox is interpreting the file as "Western (ISO Latin 1)" encoding, rather than UTF8. When I try to save it as Western, my text editor BBEdit complains that the text is actually UTF8.

    Just to confirm this with Firefox, I pasted the offending text in HTML page properly formatted and opened it in Firefox. The text displays correctly.

    From what I can tell, the exported text is fine. But Firefox and Zotero are incorrectly interpreting the encoding of the MARC export as Western, not UTF8. I want to blame Firefox and Zotero, but other libraries do not have this problem.
  • I would add that Chrome and Safari also interpret it as Western, so it is not simply a Firefox issue. There is something about how the file is created. Unless anyone has any bright ideas, I think I need to take this issue to a Koha forum.
  • koha.theology-vietnam.org serves MARC with the following header:
    Content-Type: application/octet-stream; charset=ISO-8859-1
    Some other Koha catalogs that I looked at either don't include a character encoding in the header or specify UTF-8 (the latter would be the right thing to do)
  • That makes sense, thanks for pointing that out. Any idea where to look to change this? I have pored over the system preferences without finding anything related to this.
  • Look in the opac-export.pl script. That's a Perl script that generates the entire export, it should also be setting the headers. Other than that, this is probably a question for the Koha community.
  • Great, thanks. I will look into that. I have reached out to the Koha community and will pursue this further there. Thanks to adamsmith and aurimas for helping me troubleshoot.
  • In case anyone comes looking for a solution, the patch at http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=12174, comment 1, was successful. Basically you have to modify opac-export.pl to export as UTF8. Thanks to aurimas and adamsmith for helping me find the solution.
Sign In or Register to comment.