Diacritical marks

hfancy · October 12, 2006

Importing French accents or entries with macrons doesn't work.

sean · October 13, 2006

I am one of the Zotero developers and regularly import items with French accents without any problems. Please provide a specific example of a record that you could not import. Thanks.

hfancy · October 13, 2006

Hi Sean,

That's a relief to know! Well, I'm importing citations from the Orbis catalog at Yale University in Zotero. A question mark inside a black diamond appears in the place of every accented letter. For instance:

"Masi� i de Ros, Angel. La Corona De Arag�n Y Los Estados Del Norte De Africa: Pol�tica De Jaime II Y Alfonso IV En Egipto, Ifriqu�a Y Tremec�n. Barcelona: Instituto Espa�ol de Estudios Mediter."

Note that it also cuts off the citation.

Any help would be greatly appreciated.

sean · October 14, 2006

Firefox currently suffers a weird problem rendering Orbis's unicode diacritics: they appear offset from the characters over which they are meant to appear and are generating bogus characters on import. At the moment, this is appears to be a Firefox issue, but we are looking on a work-around.

cartesian · November 16, 2006

Sean, you might want to look at the demo available on the zotero home page. One of the titles imported from the uc boulder library web site contains French accent marks and appears in the zotero library with ? marks instead! (You might want to change the demo too!)

dstillman · November 16, 2006

Yeah, we're aware of that—we didn't create the demo, and it's somewhat unfortunate that it showcases a bug in Zotero so prominently, but it's otherwise a really nice demonstration of Zotero's features, so we decided to make it available.

Incidentally, on the library page featured in the demo, Zotero handles the accents just fine if you select a single item, either by clicking through to the individual page or by selecting a single item from the multiple selection window. It only has a problem when you select multiple items at once. We're looking into it, and hopefully once we fix it Steve will update the video.

grzes · March 29, 2007

I'm having a similar problem with diacritical marks. The importing of polish diacritical marks works really fine (which is a big suprise for me - thanks!), but creating a footnote in word doesn't work properply.
e.g.
J. Kracina, hrsg., Boles.aw kard. Kominek. W s.u.bie Ziem Zachodnych (Wroc.aw, 1977).

it should be like this:
J. Kracina, hrsg., Bolesław kard. Kominek. W służbie Ziem Zachodnych (Wrocław, 1977).

Is that an issue of zotero (or firefox) or is there something wrong with my "word"?

BTW: I tried to create a footnote in a central-european font (Word for Mac 2004 Geneva CE), but that was without any success. German diacriticals work properly.

Thanks for your help.

raf · April 14, 2007

I don't want to sound like a whiner but I thought I should mention this:

When sorting authors alphabetically in the central pane, authors with an "é" in their name (as in the French "Prévost") are not sorted alphabetically.

For instance, "Prévost" appears after "Priestley."

I might be wrong but I would expect diacritical marks to follow the regular alphabetical order.

Raf

bdarcus · April 15, 2007

I might be wrong but I would expect diacritical marks to follow the regular alphabetical order.

So would I, and it's a real annoyance for me as I have a few author names that start with diacritics (and so they get sorted horrendously wrong).

Unfortunately there are some limitations in the underlying database code that Zotero is using. IIRC, it can be addressed, but is likely more a Mozilla problem/responsibility (at least at the database level; maybe there are workarounds?).

FWIW, unicode sorting is a tricky issue. For example, not all languages and cultures use the same sorting rules for extended characters. So it's not surprising that proper support is not pervasive.

dstillman · April 16, 2007

Just checked in two patches on the dev branch for improved sorting. (Item rows are sorted in JS, not SQLite, and Mozilla does offer collation support via XPCOM.)

Collections and items should now sort using a collation appropriate for the application and/or system locale, at least in theory. In practice I haven't tested how good the collation support is or how well it addresses the issues raised in the previous thread, but it should at least generally fix sorting of diacritics.

TessaC · April 17, 2007

I have also seen diacritical marks mangled when importing from the ACM DL. One example: http://portal.acm.org/citation.cfm?id=1005363&coll=Portal&dl=GUIDE&CFID=16704869&CFTOKEN=32644822

Schär becomes Sch\&\#228;r

Incidentally, this happens both when I am in the list of results and on the article page itself (where it is the only item that can be captured).

I've been doing a find & replace to fix this and reports I generate look right.

carwilj · August 16, 2007

At least one site still problematic with diacritics
is FirstSearch's WorldCat, which just produced this strange result from Spanish:

Caf�, Caballo Y Hamaca: Visi�n Hist�rica Del Llano / Zucchi, Alberta.
which should be
Café, caballo y hamaca :
visión histórica del Llano

While we're at it, the second author is included in the title from WorldCat.

mkupfer · November 24, 2007

1) When using OCLC/ First Search/ WorldCat or Article First - sorry but the diacritical marks of European languages (Spanish, French, Italian, German) do NOT translate properly into the the citation. The little black diamond with question mark still appears instead of the proper diacritical mark. 2) NOR are the capitalizations of foreign titles correctly imported into Zotero; the first letter of each word in a title ends up capitalized even when the library catalog or WorldCat correctly displays the title. 3) When entering citations manually (typing in article titles of material not in databases -- much European scholarly literature is not indexed), it is impossible to insert diacritical marks. I have to go into Word and type out the title, inserting the proper diacritics from the pull-down menu, then cut and paste into the Zotero fields for author or title. 4) In the Word plug-in for citations - the add page no. feature still does NOT work for any output style connected with the humanities or chicago... The page no. will always come out as the number of pages in the book itself, not the page no. one specifies. PLEASE FIX

sybille · November 24, 2007

Hi, here are some ideas about several of the issues you mentioned:

What character encoding is your Firefox set to use? I think it needs to be set to Unicode (View -> Character Encoding -> Unicode (UTF-8)). If I have the encoding set to something else on the WorldCat pages, I see placeholder marks rather than diacritical marks. However, with the the Unicode setting I see the correct characters, the characters import correctly into Zotero, and they export correctly using the word processor plugin (for OpenOffice.org, in my case).

As far as the capitalization of titles is concerned, have you used the "Transform Text" function? After entering a title, highlight the title field with your cursor and right click (don't left click to open the field for editing, though). You should see a "Transform Text" menu that will allow you to change to Title Case or lower case as needed, depending on the language of the title.

I believe that entering accented characters is something to set up in your Operating system or in Firefox rather than in Zotero itself. For example, in Linux, I use a series of key combinations including the "Compose" key to type such characters in all applications, including Firefox and Zotero. If you don't want to set something up for all applications, you might consider the Firefox extension called "Zombie Keys":
https://addons.mozilla.org/en-US/firefox/addon/2335

For the problems with the page numbers, it might be better to post in this thread:
http://forums.zotero.org/discussion/1680/wrong-page-numbers-in-word/
It sound like you're seeing the same thing as was reported there.

dstillman · November 24, 2007

You can turn off automatic capitalization of titles by going to about:config in the address bar and setting extensions.zotero.capitalizeTitles to false.

Joke589 · November 27, 2007

I encounter the same problem when I import an Endnote library.
1. In the .txt file I created, the references are perfect, exactly as they are in Endnote
2. In Zotero, all the accuented characters appear differently.
Here is an example of a book title :
1274 : Année charnière. Mutations et continuités. Lyon-Paris, 30 septembre-5 octobre 1974

The result in Zotero :
1274 : AnnÚe charniÞre. Mutations et continuitÚs. Lyon-Paris, 30 septembre-5 octobre 1974

I add that the caracted encoding in Firefox is Unicode (UTF-8).

Is there any explanation?

mkupfer · December 4, 2007

Hello Dan Stillman,
This sounds dumb, I know, but where exactly is the address bar for Zotero ... and where is the about:config menu so that I can fix the automatic capitalization problem??

Also I've tried changing the character encoding setting in firefox, but this has no effect on the problem of diacriticals coming out as little diamonds. I don't know if Zotero programmers are reading this thread, but if so, please do fix this problem. Why can't the text from WorldCat or other libraries come into the Zotero record exactly as it appears the source database?

noksagt · December 4, 2007

This sounds dumb, I know, but where exactly is the address bar for Zotero ... and where is the about:config menu so that I can fix the automatic capitalization problem??

Use firefox's address bar (that contains the URL of the current page). See MozillaZine's about:config KB entry.

Why can't the text from WorldCat or other libraries come into the Zotero record exactly as it appears the source database?

Many sites are scraped fine with UTF-8 characters.

Sometimes the data that Zotero scrapes is not identical to what you see on the screen (for many translators, an RIS or MARC or whatnot is downloaded & these sometimes can't reflect nuances of data (incomplete info or no/poor support for not-latin characters)).

Please list actual URLs that fail...

mkupfer · December 9, 2007

Hello noksagt,
The site that consistently fails is WorldCat/ First Search, which I access via my university vpn (Hopkins). Here's the url to an example. http://firstsearch.oclc.org/WebZ/FSFETCH?fetchtype=fullrecord:sessionid=fsapp6-39580-f9zzb0f4-c1tolm:entitypagenum=3:0:recno=3:resultset=1:format=FI:next=html/record.html:bad=error/badfetch.html:entitytoprecno=3:entitycurrecno=3:numrecs=1

Indeed, one can only use WorldCat/ First Search, OCLC through a portal (e.g. a university library, research institute) that subscribes to this database. Other users have experienced problems with Zotero's relation to WorldCat, so the problem has nothing to do with the venue through which we get onto First SEarch.

Apart from WorldCat, the JHU on-line catalog itself works fine with Zotero (except for Zotero's problem with capitalization of the first letter of every word in a title).

Interestingly, the website for a French journal that I accessed via my connection to FirstSearch/ WorldCat also worked in Zotero... diactrics and capitalization. http://msh.revues.org/document2859.html

If anyone from the Zotero organization is reading this -- OCLC FirstSearch/ WorldCat is the single most important site with which Zotero needs to work. If the program can't capture the diactritics and correct capitalization of words in titles, then I can promise that humanists working in languages other than English will not use this program.

bdarcus · December 9, 2007

If anyone from the Zotero organization is reading this -- OCLC FirstSearch/ WorldCat is the single most important site with which Zotero needs to work. If the program can't capture the diactritics and correct capitalization of words in titles, then I can promise that humanists working in languages other than English will not use this program.

You're not the first person to do this, so I don't mean to pick on you, but all of this threatening "I'll take my marbles and go home if you don't do X" hand-waving is not the most productive way to make your point.

For one thing, do you even KNOW that the fault is with Zotero?

For another, it's quite enough to say that you consider this a high priority problem. And then be a little patient: the developers have likely hundreds of other such problems to deal with.

lemur · December 11, 2007

For one thing, do you even KNOW that the fault is with Zotero?

bdarcus, stop being so defensive. The reality is that Zotero does not handle accented characters correctly. I've used it for two days and found two bugs that prevent me from using Zotero for serious work. I've used Refworks for months and found 0 bugs. End result: I've returned to Refworks. Zotero looks promising but as long as it does not support accented characters 100% correctly, it is a toy, not a tool. The fact that Unicode is a bitch to code for (and I know it from experience) is no excuse.

I've already reported the first problem here:

http://groups.google.com/group/zotero-dev/t/61a52b5012bccd3c

And I'm about report the second one.

bdarcus · December 11, 2007

lemur: I'm not being defensive, if you mean an irrational rejection of criticism. I'm not even contesting that Zotero might have some problems in this area, since I've previously seen them myself (though encoding problems are all over the place on the web it seems to me, which no doubt presents some problems for Zotero). I'm merely pointing out that this is a free software project, where people can and should contribute to their own solution. Your bug report is an excellent example of how to do that.

dstillman · December 11, 2007

OK, I've committed a patch that should hopefully fix WorldCat FirstSearch diacritics. (It works for me now.) If you're running a dev build, please update to the latest version, update translators, and try importing records with diacritics from WorldCat, and let us know if you're still seeing mangled characters.

For the curious, FirstSearch uses ISO-8859-1 encoding rather than UTF-8, so I had to add support for overriding the XMLHTTPRequest channel's content character set from a translator and change the FirstSearch translator accordingly. (In other words, it's a site-specific issue that has nothing to do with handling of accented characters elsewhere in Zotero. There might be some ways we can detect the charset programmatically, but in the meantime we're stuck fixing these on a site-by-site basis.)

erazlogo · December 12, 2007

Dev XPI doesn't seem to be updating since version 1988.

dstillman · December 12, 2007

Dev XPI doesn't seem to be updating since version 1988.

Should be fixed now. Sorry about that.

erazlogo · December 12, 2007

When I try to download I get this:

"Firefox could not install the file at:
...
because: Invalid file hash (possible download corruption)
-261"

I tried several times.

erazlogo · December 12, 2007

okay--got the solution from the dev list.

singlechannel · December 16, 2007

First of all, Zotero looks like the most promising piece of bibliographic software I've seen in a long while. Make that ever. But, like other users above, I've had problems with the recognition of encodings and consequent mangling of diacriticals. I'm using Zotero on Kubuntu 7.10, default locale en_GB.UTF-8, same also in Firefox. When importing my main bibtex file (which is also in utf-8), all diacriticals are mangled: Zotero refuses to treat it as anything but ISO8859-1...

Zotero can be appeased by doing this:

user$ iconv -f utf-8 -t iso8859-1 original.bib > iso8859-1.bib

and then importing iso8859-1.bib into Zotero. This obviously requires one to first edit original.bib so that it only contains records that work in ISO8859-1.

It works, though it took me a good hour to import 3200 records from a bibtex file.

mkupfer · December 28, 2007

For Dan Stillman:

"OK, I've committed a patch that should hopefully fix WorldCat FirstSearch diacritics. (It works for me now.) If you're running a dev build, please update to the latest version, update translators..."

How exactly do I access and install this patch? I just have the regular Zotero. Finding the fix for the WorldCat First Search diacritics from the "for developers" section of the web page is not obvious.

Would greatly appreciate step by step instructions........
Thanks

nathan.hopson · July 22, 2008

Love Zotero, and respect those working on it. Keep up the great work!

That said, Zotero 1.07 still doesn't import macrons (I can't vouch for other diacritics) properly from WorldCat. I am using FF3 (fully updated) and viewing WorldCat in UTF-8, but the macrons are simply dropped from my citations. That's better than garbling, but it's still not perfect.

SalishSea · November 26, 2008

I attempted to capture the following reference, and saw the previously documented substitution of diacritical marks with question marks printed against a black diamond background. Both author and title show similar errors:

The proper citation is: Les incidents du prélèvement. Incidents of blood donation B. Danic a,*, H. Gouézec b, E. Bigant a, T. Thomas Transfusion Clinique et Biologique 12 (2005) 153–159

The reference was captured from ScienceDirect's single-article page (as I also wanted the PDF) using Zotero's "single article" scaping icon (the single page with lines on it). It was stored as, and rendered out to OO3 as:

Danic B, Gou�zec H, Bigant E, Thomas T. Les incidents du pr�l�vement. Transfusion Clinique et Biologique. 2005 Jun ;12(2):153-159.

I am using OO3, Firefox 3 and Zotero 1.5a1.r3821

As always kudos and thanks for a great product. Also, compliments on handling the forum and the much increased traffic so well.