MARC Import and encoding

Hello,
willing to add a translator for l'harmatheque I was happy to see that they provide MARC files and that I was going to be able to use the import MARC translator. Unfortunately there seems to be some encoding issues that are above my JS skills.

Here is a preview of the translator:
https://github.com/symac/translators/blob/master/Harmatheque.js

And you can try it with this kind of page:
http://www.harmatheque.com/ebook/les-jours-heureux-roman-47782

As you can see the diacritics characters are wrongly coded in the saved item, and some fields are truncated (author name for example). I am quite confident that it has something to do with the following lines in MARC.js but would like to know if someone has any clue:
https://github.com/zotero/translators/blob/master/MARC.js#L130-L143

Thanks in advance,
Sylvain
  • Looks like the data is encoded either as Windows 1252 or ISO-18859-1 (both very similar), but is being served as UTF-8. You can simply fetch the data via ZU.doGet in the correct encoding by specifying it as the 4th parameter (first is URL, second is the callback, third is callback to be executed when all URLs are done processing). So, you want ZU.doGet("http://www.example.com", function() {...}, null, "windows-1252");
Sign In or Register to comment.