Ignore special character ʿ when sorting alphabetically

dakaplo · November 28, 2018

Some of the authors' names in my bibliography are transliterated from Arabic and include special characters. Zotero is usually fairly good about this, but I would like to tell it to ignore the character ʿ at the start of a word. So "Ḥasan al-ʿAṭṭār" should be at the beginning with the A's, not at the very end where Zotero is putting him. In other words, I want to treat "ʿAṭṭār, Ḥasan al-" as simply "Aṭṭār, Ḥasan al-" for the purposes of sorting. I'm using Chicago Manual of Style 17th Edition (full note).

Thanks!

djross3 · November 28, 2018

https://en.wikipedia.org/wiki/Ayin
This is a real letter in Arabic, and should arguably be distinguished from the glottal stop found before words 'starting with' a vowel.

From what I understand of how Zotero works, there is no way to specify anything about specific characters in individual styles. Ignoring a character like this would require a special rule in Zotero's processing to always/in general ignore this character.

Do you have any references to style conventions that give an explanation for whether or not this character is always supposed to be ignored for transliterated Arabic?* This is a good question, and I don't know the answer.

*Note: I see no reason that ع would be ignored in sorting in Arabic-script bibliographies, so my default assumption would be to not ignore transliterated ʿ either.

If this is a general convention, then Zotero could specifically add a rule to do this for everyone. But for you personally, for now, the answer is just to fix these (hopefully few!) entries after generating your bibliography (save a backup of your document, then unlink citations and edit the text-only version).

bwiernik · November 28, 2018

Is this generally a rule regarding sorting of Arabic names?

@fbennett @adamsmith I think this would entail a tweak to the citeproc-js sorting rules?

dakaplo · November 28, 2018

Thanks for the quick responses!

I'm well aware that ع and أ are different letters in Arabic—the trouble always comes when transliterating and typesetting in English, where word-initial glottal stop is usually not marked, and word-initial ʿayin is marked inconsistently.

In non-academic English text, the ʿayin is frequently omitted altogether (e.g. names like Abdullah, Ali, Abdel Nasser - عبد الله، علي، عبد الناصر). In the style of the International Journal of Middle Eastern Studies, the diacritic for ع is supposed to be retained at all times, but it's ignored in alphabetical sorting. That way an Anglicized spelling like Abdel Nasser will appear next to a transliterated spelling like ʿAbd al-Nasir. See: https://ijmes.chass.ncsu.edu/ijmes_translation_and_transliteration_guide.htm
(On their word-list, they apparently couldn't find ʿ and just used a super-script 'c'... https://ijmes.chass.ncsu.edu/docs/WordList.pdf)

Most recently-published academic books use some modification of IJMES, and they use the sorting order I've described. For example, the index of the book I have at hand goes "ʿAbbas al-ʿAbd, ʿAbd al-Latif, Abdel Aal, Abdel-Messih, Abu Golayyel, Abyad, Adonis, [...] aḥdāth, ʿAlaʾ al-Din, Alaidy, etc."

For now I will probably just move things around myself instead of wading into the style definitions...

djross3 · November 28, 2018

That sounds convincing enough to me.

bwiernik · November 28, 2018

@dakaplo Okay. This isn't something that could be addressed through the style definitions, but has to do with the citation processor Zotero uses to interpret citation styles and data. It would need to be addressed by the maintainer of that processor.

dakaplo · November 28, 2018

@djross3 It's a chronic issue in scholarship about Arabic—add to all this the fact that there are multiple other transliteration schemes floating around.

@bwiernik Got it. Ideally for my purposes, since people tend to be inconsistent, Zotero would do the same thing for word-initial ʿ, `, ', and ‘. But I suspect that would mess up its behavior for other languages. I'm grateful that Zotero at least knows how to recognize "al-" at the beginning of a name.

For now, a little manual moving around isn't so bad. At least writers no longer have to go back through type-written manuscripts and add diacritics with a pen...

djross3 · November 28, 2018

Right. I'm aware of the issue, I just wasn't sure about how general the convention was for ignoring that character while sorting transliterated names, but what you wrote makes sense.
I'm not sure about the other characters: they're basically punctuation, which is probably ignored in general (?), but might behave differently in some languages. I wonder about Hawaiian, for example with ‘, which looks the same as a fancy single quote (but is actually encoded differently in unicode: https://en.wikipedia.org/wiki/ʻOkina). Probably an unlikely issue for the other characters in names at least, but I'm not sure. (And for titles maybe they'd be ignored anyway to sort based on the first letter?)

@bwiernik out of curiosity, could this behavior be associated with the Language field value for Zotero entries? (I imagine something similar could potentially work out the difference between Dutch and German "von" for example.)

DWL-SDCA · November 29, 2018

This is also an issue for author names of other nationalities:

'Ofanoa, Malakai Mahu

't Hart, Bernhard Marius

't Mannetje, Andrea

(for the purposes of this forum I used the apostrophe character)

This is a very interesting topic for me as I've been in communication with authors and indexers of several different languages to determine if there is a single best way to drop or not drop the "punctuation" mark whether or not the name has been transliterated. This as noted above has an important impact on how alphabetical lists of author names should be ordered. The same issue occurs with "decorated" ascii characters (the Swedish Å is sorted differently when automated by the computer than it is when placed by a human).

djross3 · November 29, 2018

There is a real difference between the punctuation of Dutch, and the 'Okina letter of Hawaiian. (Still, by convention they might be treated as part of sorting or not.)

The same issue occurs with "decorated" ascii characters (the Swedish Å is sorted differently when automated by the computer than it is when placed by a human).

Crucially, different modified letters, and sometimes varying by language, are treated as either letter variants (sorted the same) or different letters. At times this is inconsistent as well.

For example, most European languages treat diacritics as ignored modifiers for sorting, just different ways of writing the same letter. But in Norwegian, there are three additional letters even though their forms are of modified Latin letters: æ, ø, å. [I believe Swedish is similar.] And those three letters strictly are sorted at the end of the alphabet, because that's exactly what they are: three letters following Z. In Spanish, ñ is a different letter, but accented vowels are just variants: á, é, etc. (However, that's not so important for sorting because ñ only begins a few borrowed words, so would only change sorting in the middle of names, if at all.)

For the cases of punctuation, I imagine that the same characters (single quote, etc.) in a title should also be ignored, and that would apply similarly for Dutch. But for Hawaiian or Arabic (or Norwegian), where these are actually letters of the alphabet, in principle they should be sorted as such, but due to essentially misapplying English-oriented intuitive sorting, they may be "borrowed" into the "ignore it because it looks like punctuation" category.

Very interesting topic indeed!
[Note: I'm a linguist and have some working familiarity with dozens of languages, though I can't claim any specific expertise in how bibliographies are sorted in most of them. Most of my experience there just comes from paging through dictionaries written in many languages and trying to figure out why it's not in the order I expected.]

fbennett · July 14, 2019

Solutions for this will need some time, study, and coordination. A first step would be to have one or more processor test fixtures that illustrate the sorting issues, as a basis for discussion and as a development resource. There is some documentation on test fixtures in the CSL test suite repository. The layout and logic of CSL tests is a bit daunting at first, but that's what we would need.