HTML encoded ä, ö, ü

I am quite new to zotero and I am trying the 2.0 Beta.

There seems to be an issue reading the HTML encoded German "Umlaute": ä, ö, ü as they can be found on many sites e.g.
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TFS-44YVRRV-7&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=6b052e04caa26015370ba753a7082493

is not working.In the Author list I get output like "Höök"

Is this a known bug or any current work-around / fix?

thanks!
  • What is the complaint?
    In the Author list I get output like "Höök"
    That is exactly what is written on the page & what the author's name is & what Zotero imports.
  • I think FrauHolle actually pasted in encoded characters and the forum is just displaying them properly. I get "Höök" in Zotero after importing from that page. We'll have to look into it.
  • Strange. That page uses the ScienceDirect translator for me, which just uses RIS (and it doesn't have HTML-encoded entities & I don't know why it would). When I save the page, it uses the CrossRef DOI translator & that also does not use HTML-encoding.
  • Ouch, sorry, i didnt look at how my final post would look like. Indeed, Dan Stillman is right. I get the version with the &s and all the other special characters. Is there a way to tell zotero to treat these HTML encodings as a normal letter?

    And how can you see / change the translater which it uses?

    thanks already for the comments!
  • And how can you see / change the translater which it uses?
    It will often be stored in the "repository" field, but you should really look at the debug output.
  • You should be able to just look at the translator name in parentheses when you hover over the address bar icon.
  • There are two modes to the ScienceDirect translator—one for guest access and one for authenticated (e.g., proxied) access. It looks like the guest access mode, which uses screen scraping rather than RIS, is saving the authors incorrectly. The problem, I suspect, is the doGet(), which returns an HTML page as plaintext, and so the encoded characters aren't being properly converted. Ticket created.
  • Dear all,

    thanks for those supportive hints. As Dan Stillman mentioned it is indeed possible to obtain the correct version when logged in. The faulty version including the encoded HTML characters are just appearing in the guest access mode.

This is an old discussion that has not been active in a long time. Instead of commenting here, you should start a new discussion. If you think the content of this discussion is still relevant, you can link to it from your new discussion.