Worldcat translator losing macrons

Recently when importing romanized Japanese titles from Worldcat the macrons over certain letters have been getting lost. This was not previously a problem. Has there been some change in the translator? Is there a setting I can change to make sure the macrons are properly imported?
  • nothing has changed with the translator for at least half a year, the last substantial change was almost a year ago.
    Do you have an example you could link to?
  • Thank you for your reply. Here's one as an example, but I'm having the same problem with all Japanese titles:

    http://www.worldcat.org/title/yumemiru-shumi-no-taisho-jidai-sakkatachi-no-sanbun-fukei/oclc/587117570&referer=brief_results

    The macrons over the "o" and "u" in the book's romanized title are getting lost.
  • are you using Firefox? If so, you can get macrons by right-clicking on the Translator icon and selecting "COinS" instead of "Open Worldcat" as the translator.
    Unfortunately there are other downsides to using COinS (e.g. we don't get multiple authors) so I don't think we'd want to use it more generally.
    It's unfortunate that Worldcat doesn't put Macrons et al. into the RIS export which we're currently using. You could ask them why, maybe they could fix that or offer alternate versions - no reason RIS can't be utf-8
  • Thanks for the workaround! The odd thing is that I was using the Chrome connector and not having any problems before... I wonder if the default translator in Chrome had previously been COinS. In any event, there's no reason indeed not to be using utf-8 at this late date.
  • When I first noticed this I spoke with a couple of folks at WorldCat. I was told that they had many objections to RIS with certain decorated characters. They say that they are considering my request that they embed unAPI with UTF-8. For the longest time Japanese, Chinese, and Russian characters came down garbled because of peculiar encoding. I could make the characters appear correct by copying the words into a good text editor and making them UTF-8. That was OK for now and then but not acceptable on a regular basis.

    Maybe others could request embedded metadata with proper character encoding. This might balance out the complaints that led to vanilla RIS.
  • Thanks for the information. I'll also request it.
Sign In or Register to comment.