MARC Import and encoding

Hello,
willing to add a translator for l'harmatheque I was happy to see that they provide MARC files and that I was going to be able to use the import MARC translator. Unfortunately there seems to be some encoding issues that are above my JS skills.

Here is a preview of the translator:
https://github.com/symac/translators/blob/master/Harmatheque.js

And you can try it with this kind of page:
http://www.harmatheque.com/ebook/les-jours-heureux-roman-47782

As you can see the diacritics characters are wrongly coded in the saved item, and some fields are truncated (author name for example). I am quite confident that it has something to do with the following lines in MARC.js but would like to know if someone has any clue:
https://github.com/zotero/translators/blob/master/MARC.js#L130-L143

Thanks in advance,
Sylvain
  • Looks like the data is encoded either as Windows 1252 or ISO-18859-1 (both very similar), but is being served as UTF-8. You can simply fetch the data via ZU.doGet in the correct encoding by specifying it as the 4th parameter (first is URL, second is the callback, third is callback to be executed when all URLs are done processing). So, you want ZU.doGet("http://www.example.com", function() {...}, null, "windows-1252");

This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.

Sign In or Register to comment.