Accented characters not captured correctly

The accented characters in author names / article titles are not captured correctly sometimes. It looks like some kind of encoding trouble.

Here's an example page:

http://prd.aps.org/abstract/PRD/v46/i4/p1379_1

Note that name of the first author.

Also a suggestion: it would be very useful to be able to search while ignoring accents. E.g. searching for "a" should match "á" as well. The Windows API call that converts other encodings to ASCII does this (e.g. it converts all of ó, ö, ő to o when doing a UTF16 -> ASCII conversion). Perhaps Firefox has something similar available that you could use?
  • A developer from APS recently submitted an updated translator to work with all APS sites, and he changed the charset for RIS retrieval from Latin-1 (which our PROLA translator used previously) to UTF-8. That page has a Latin-1 RIS file, so it looks like they might use a mixture of the two charsets. I'll post a note on zotero-dev asking him about this.

    Ticket created for accent-agnostic searching, though I don't know whether or not this is possible in Firefox. Given that Firefox's built-in search doesn't do it, I suspect not.
  • Ticket created for accent-agnostic searching, though I don't know whether or not this is possible in Firefox. Given that Firefox's built-in search doesn't do it, I suspect not.
    It appears that Sqlite can handle this, although I'm not quite sure how to do it. There's some discussion of related Sqlite issues here and here, and on methods for doing this in languages with strong Unicode libraries here.

    This is one of the bibliographic gotchas that would be great to get right, but which a great many systems get wrong.
  • Yeah, it's actually trivial. Support was added in Firefox 3.6.
  • Ignoring accents is critical. I have many references in my database (>5000 entries) with accents, mainly in authors' names. Since I capture from the web, the same authors go accented in different ways, and in the same collection of names I have different combinations of accented/not accented names. It makes very difficult to search for entries with, let's say, two specific authors, when both of them are accented. I have to search for 4 combinations: both not-accented, accented-not accented, not accented-accented, both accented.
    Even when I search just for 1 author, it is bothersome to have to carry out two searches. More complicated if I combine with other terms.
  • I'd like to add my opinion here: it would be great to be able to search my Zotero database with a string of unaccented words and get results that include all possible matches, including those with umlauts, accents and other marks commonly used in languages other than English. It can be difficult to find an author whose name includes such features if that name often imports from online searches without its correct spelling. And, as I write in English, I find it a little cumbersome to search those names from the Word plug where I am often at a loss to remember the keyboard shortcut for accented text. If anyone has figured out a work-around or has some advice on this let me know.

    Thanks!
  • Everyone including devs agrees this should happen, it's just a question of someone doing it.
    But until then the only advice is to set up your keyboard for better accent typing (e.g. switching keyboards or using an international layout on Windows) and to fix author on import (which you should do in any case, lest you get wrong bibliographies).
  • it's just a question of someone doing it
    It actually may not be — my comment above that this was trivial may have been wrong. As far as I've been able to tell the couple times I've looked into this, the Mozilla feature that's supposed to allow this doesn't work correctly and may need to be rewritten with the new post-29 collation support. I'll report back with any new info.
  • Thanks! I appreciate it.
Sign In or Register to comment.