Any idea why an "A" author comes last in the bibliography
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
The CSL Chicago styles have the following setting in the cs:style node:
demote-non-dropping-particle="never"
Under the CSL specification, that means that a non-dropping particle should be included as part of the surname when sorting names (scroll down to the subheading "Sort order A: non-dropping-particle not demoted" for an example).
Setting the Arabic particles as non-dropping (in the CSL sense) would place all "al-" prefixed names under "A" in the bibliography, in the standard CMOS style. There seems to be general agreement that that would be wrong.
Arabic particles could be classified as non-dropping with a cs:style attribute of:
demote-non-dropping-particle="sort-only"
This would sort Arabic names correctly, but the sorting of European names with non-dropping particles would then be broken under CMOS rules. So that won't work either.
When Arabic names are set as dropping (as at present), they are always demoted in the bibliography (which appears to be correct, or at least acceptable, under CMOS); but as you and joehill have pointed out, that also drops the particle from the surname when it appears in citations, which is not acceptable.
So at present, there appears to be no setting in CSL that will produce the desired behaviour without breaking things somewhere. I think that's telling us that we have a third category of particle on our hands. That is a CSL issue in the first instance. Once the handling of these particles is specified in CSL, processors will be able to conform to the spec.
The new parsing and classification mechanism in citeproc-js may be in the wrong place, the code behind it may be too opaque, and it may need better documentation, but it is nonetheless our friend, because it will permit immediate support of a third category of particle without the need for special markup or elaborate UI support. But before we can adapt it to solve this problem, the proper handling of Arabic name particles and the parameters for discretionary adjustments, if any, will need to be specified in CSL.
You can trial the revised processor with either of the Propachi plugins.
(It does still sound like the Arabic name particles require slightly different handling than those on European names. If there is a clear picture of the requirements, and if they are at odds with other formatting requirements, it would be worth raising the issue on the CSL list [xbiblio-devel]).
@fbennett: thanks for all your efforts into this! For me, "al-" is a particle which can be treated the same as the Dutch "van" or "van der" with regard to capitalization, non-droppingness and alphabetization - but I am not a coding specialist, so I might not be aware of the intricacies behind it, nor am I aware of all the linguistic intricacies of other European languages and their name particles. I don't know if it's useful, but I'll illustrate my point of them being eligible to equal treatment through an example:
Example (all Chicago) of citations with names involving particles "van der" and "al-":
1) as a full note (no capitalization of particle and currently correct in Zotero Chicago Style)
- Peter van der Veer, Imperial Encounters: Religion and Modernity in India and Britain (Princeton: Princeton University Press, 2001).
- Aziz al-Azmeh, Islams and Modernities, 3rd ed. (1991; repr., London: Verso, 1999).
2) shortened citation (both non-droppingness + capitalization, but currently inconsistent)
- Van der Veer, Imperial Encounters. [currently: van der Veer, Imperial Encounters, 160 (which is also acceptable to me)]
- Al-Azmeh, Islams and Modernities [currently: Azmeh, Islams and Modernities (which is kind of weird, al- should never drop completely)]
3) bibliography
- Veer, Peter van der, Imperial Encounters: Religion and Modernity in India and Britain (Princeton: Princeton University Press, 2001). (which I changed through editing the style, as Gracile suggested)
- Azmeh, Aziz al-, Islams and Modernities, 3rd ed. (1991; repr., London: Verso, 1999).
Try installing the Processor Patch plugin, and see what results you get (you may need to switch away from your style, then back for the plugin to take effect). With the plugin installed, and "al-" and its friends set at the start of the surname field, they will be treated as CSL "non-dropping".
I tested a style with demote-non-dropping-particle="display-and-sort", and it seems to produce the effect that you and other Arabic contributors are signalling: the particle is retained in author-date forms for citations, and demoted to a position after the given name when the name-part order is reversed in the bibliography—but the results I see are not the ones that matter.
For you last comment, I don't use author-date forms (but shortened citations, or short notes), so I don't know about that. The demotion of the particle in the bibliography is fine as it is according to the Chicago Manual of Style, although some arabists might disagree (but there'll always be disagreement, I guess...).
Thanks again and a lot!
The total dropping of "al-" in short-form names happens because, as nickbart says, the particle is set as "dropping" in the old processor release.
If the new processor produces satisfactory results, the next question will be whether "non-dropping" behaviour should be forced for this category of particles. If "forced," they would be treated as non-dropping even if entered after the given name in the Zotero entry:
first field: Azmeh
second field: Aziz al-
In the current "Processor Patch" version, the form above would be treated as "dropping," but if that is never desired for these particles, it could be made to work as "non-dropping" in this form as well.
So there is still a possibility that Arabists might prefer this category of particles to be treated as a third category. That can be done, but it would need to be agreed on the CSL list (xbiblio-devel).
I'll move my "al-"s to the first field, but that doesn't mean that others would necessarily all want the same result or that publishers have the same rules for it.
a) whether they want "al-" to drop in shortened citations (not as likely to be agreed on, I guess) [the same applies to author-date form, I guess]
b) whether then want "al-" to be demoted in bibliographies (more likely to be disagreed on, I guess) [but in any case its demotion or its non-demotion should not have an impact upon its alphabetization]
@fbennett: and thanks for you explanation that the non-capitalization conforms to CSL. I'll just accept it then!
For documentation (to respond to nickbart, above) I think the thing to do will be to construct a dynamic page that accepts a list of particles or names, and returns a detailed explanation of how those specific items are treated in CSL. Somehow I don't think a technical doc that explains the parsing rules would attract much readership.
I don't have time to work on it now, and probably won't until early next year, but (as a note and a nudge to folks out there with coding skills and an interest in these things) the processor code to copy into such a page is here. (the critical bits are the createCategorizer() function, and the anonymous return function at the end)
I tried following the discussion, but it's still not entirely clear to me whether current CSL can properly cover the Arabic names. Is it correct that a name like "Tawfiq al-Hakim" should only ever appear as either "Tawfiq al-Hakim", "al-Hakim, Tawfiq", or "al-Hakim", and sort under "H"? And that we're fine with regard to prefixes like "Abu-" as long as we just treat these as part of the family name, and not as a particle at all?
The story so far seems to be that opinions differ. These two are apparently both seen in name-as-sort-order entries (with the first two having some strong adherents maybe? and the last appearing in CMOS examples):
Al-Hakim, Tawfiq
al-Hakim, Tawfiq
Hakim, Tawfiq al-
The guidance seems to be that names prefixed with al- and its friends must always sort under the root (so under "H" for this name). So if they are non-dropping particles, demote-non-dropping-particle="never" yields a bad sort.
In the short form, this category of particle should never be dropped, ever (which is the current behaviour for non-dropping particles with all three settings):
(al-Hakim, 1929)
Treating Abu- as a non-particle seems to be correct.
So we're close. The two main issues appear to be:
When there can be so many different representations of the same author's name in the Zotero library, how is one to recognize that the authors are the same person and that the owner of the Zotero database needs to be edit the names to be consistent? (The database owner is typically not an expert in conventions of name indexing but in the professional discipline topic at hand.)
If Zotero can help identify same-author name variants and further suggest best to consolidate the names into a standard, that would be wonderful. Perhaps, this could be an add-on utility that could be run when needed. If the standard Zotero configuration makes name configuration adjustments to conform to a citation style, it is all the more important that author names in the Zotero database conform to a standard. Else, attempts to help with name disambiguation could have the opposite effect.
This brings forward another issue. Should the author's name be presented exactly as it appears on a publication or should corrections be made to make the names conform to a standard (say, CMoS 14.72). What about times that an author's name had been incorrectly printed in a publication? Sometimes this is corrected via a notice in a later issue of the journal. Sometimes (with electronic publishing) this is retroactively changed in the original pdf version, sometimes not. Sometimes changes are made to publisher-supplied metadata, sometimes not.
A related issue is spelling of author names that have been changed to avoid extended ascii characters -- German, Scandinavian, and other language names with umlauts (sometimes the umlauts are simply and incorrectly ignored, other times a 2-letter form is used). French, Portuguese, Spanish names with simple accents may appear with or without the decoration. Attempts at name standards (ORCID, VIAF) are of minimal help, at best. I've encountered several authors with two or more ORCIS IDs. It is common for authors to have several VIAF identifiers with the same and different spellings. If Zotero can help with this, it would be astoundingly wonderful.
As curator of SafetyLit, a multidisciplinary bibliographic database, I struggle with this daily.
Here you go.
citeproc.opt.development_extensions.parse_names = false;
The first invocation of apostropheNormalizer() sets apostrophes to the straight-single-quote form for parsing, and the second forces them to right-single-quote for rendering. (The end result is that you get consistent output even if the name as entered has a mixture of the two encodings for apostrophe.)
I continue to think that the CSL schema in its current form is sufficient, at least for European and Arabic names. To recap:
Certain names start with non-dropping particles, where “non-dropping” means these particles have to appear in in-text citations (“van den Keere”, “al-Hakim”) but may or may not be dropped in a bibliography for sorting (“al-Hakim, Tawfiq” [sort under “H”], “van den Keere, Pieter” [sort under “K”]), or sorting and display (“Hakim, Tawfiq al-”, “Keere, Pieter van den”).
The Chicago Manual clearly recommends the sort-and-display variant (16e: 8.10, 8.14, 16.71, 16.76); that’s why I would argue that all CSL Chicago styles should switch to `demote-non-dropping-particle="display-and-sort"`.
By contrast, any last name that does not function this way, i.e., where elements are never removed from the front for purposes of sorting or display, or in other words, where the last name is always used in one and the same form only throughout a document, both in text and in a bibliography, should be parsed as one multipart last name.
For example, I would argue that “La Fontaine” should be understood, contra the examples given in http://docs.citationstyles.org/en/stable/specification.html, as one single multipart last name, since “Fontaine” never seems to be used alone, neither for sorting nor display (I’ve sometimes seen “Fontaine” used as a crossreference pointing to “La Fontaine”, but that’s nothing currently implemented in CSL anyway).
Parsing such “immutable” last names as multipart last names will most likely take care of all “potential objections to demoting the particle when demote-non-dropping-particle="display-and-sort" is applied for European name formatting” you referred to earlier in this thread.
If this seems acceptable so far, it would also mean that some of citeproc-js’s parsing rules need to be reviewed, e.g., the one on “La”. Protecting such names by wrapping them in double quotation marks would serve as a workaround, of course.
On the other hand, if a genuine need is felt to have more flexibility, e.g., allowing different settings for demoting various individual groups of non-dropping-particles (e.g., “al-” vs. “van den” vs. “La”) we’d have to discuss an extension of the CSL schema – but currently I don’t really think that’s necessary.
@aurimas: “I'll move the parsing into csl JSON generator in Zotero, so it's exported via translator.” – That’s fantastic, for overall clarity, for easier debugging, and in particular for using Zotero with other citeprocs such as pandoc. Thank you.
If it is, we could use this, minus the code, as a basis for documentation. The meaning of my shorthands “DP”, “NDP”, “split into DP + NDP” should be obvious; “NDP / DP” (several variants?) = parse as NDP if found in the last-name field, parse as DP if found in the first-name field.
A few apparent bugs, all seen with Zotero 4.0.27.5, Propachi: monkey-patch for Zotero CSL processor (standard version) 1.1.11, and a patched chicago-author-date.csl with “demote-non-dropping-particle="display-and-sort"”:
(1) The algorithm seems to be case-insensitive, but at least “Van” with a capital “V” would usually indicate a Belgian or American last name where “Van” should not be parsed as a particle but as part of a multi-part last name. This assumes of course that all Dutch “van”s are entered in lower case.
(2) “ter” is parsed as a dropping-particle but is clearly a non-dropping-particle (CMOS 16e 8.10). I would guess that “ten”, “uit de”, “uit den”, “in 't”, “in de”, “in der”, “in het”, “'s-”, “'t”, “op de” should be non-dropping, too.
(3) “von und zu Author” / “Al” in Zotero’s two-part name field is rendered as:
“von und zu Author 2015” [in-text] and
“und zu Author, Al von. 2015. …” in the bibliography.
This should be “Author, Al von und zu. 2015. …”
Same with “vom und zum”.
(4) “da” should be “NDP / DP”: CMOS 16e 8.8 has
“Agostinho da Silva; Silva” but “Vasco da Gama; da Gama”
EDIT: … or possibly “MPLN / DP”, i.e. “parse as multi-part last name (= leave it as is) if found in the last-name field, parse as DP if found in the first-name field”.
Regarding (2), “ten”, “uit de”, “uit den”, “in 't”, “in de”, “in der”, “in het”, “'s-”, “'t”, and “op de” are all common Dutch particles (probably exclusively, too), and as such, non-dropping.
The entries are indeed non-case-sensitive. Pull requests welcome - I can move this module to GitHub with direct write privileges if that will make workflows easier.
Why are there different NDP / DP variants?
What do the numbers mean – do these stand for the number of words a particle consists of? Should “von und zu” and “vom und zum” have a “3” then?
What would be the code for “MPLN / DP”?
Could you consider adding support for case-sensitive parsing?
[[0,1],null], [null,[0,1]]
It would make sense to normalize those to a single form. (It would also make sense to abstract away all those nested arrays, and replace them with named constants for readability…) Yes, that's it. Those should be 0,3. I think you could express it with something like [null,null], [null,[0,1]]. I don't know how the existing code would react to that, but it could be adjusted.[null,[0,1]], [[0,1],null]
(This would need an extension to the categories. We currently have "dropping" and "non-dropping". To that we might want to add "part-of-name" and possibly [but not certainly] "never-dropping" [for Arabic name particles]). I'll wait for other feedback before moving on it, but it sounds like that would be a good idea.
(We would need to step carefully on this, though. If case sensitivity were turned on with the current code, every possible combination of upper- and lower-case would need to be specified, which might make things a little hard to manage.)
EDIT: Strikeout and text in parens added.
For example:
Dios, J.R.M.-d.
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1570388
Martínez-de Dios, J. Ramiro
http://www.publish.csiro.au/?paper=WF05004
Martínez-de-Dios, J. R.
http://link.springer.com/chapter/10.1007/978-3-540-73958-6_8
Martínez-De-Dios, J. R.
http://www.sciencedirect.com/science/article/pii/S0262885607001096
Ramiro Martínez-de Dios, José
http://www.mdpi.com/1424-8220/11/6/6328
Martinez-de-Dios, J.R.
http://onlinelibrary.wiley.com/doi/10.1002/rob.20383/abstract
Martinez-de Dios, J.R.
http://onlinelibrary.wiley.com/doi/10.1002/rob.20108/abstract
Ramiro-Martínez-de Dios, José
In print, not online (edit)
de Dios, José R.M.
In print, not online (edit)
Anyone who is not familiar with his work citing almost any two of the above names might not realize that each of the names above represent the same person. Is it beyond the reach of Zotero to offer a hint that there may a need to seek more information useful to disambiguate or to merge these authors? I cannot imagine a way for a translator to parse these examples in a way that can indicate they are the same author.
There is also the issue of the proper way to cite articles by this author. Should his name be standardized or cited as-is? I typically have seen his work cited with his name standardized -- however, differently by different publishers. (edit: Elsevier cites him the same way even when the name form from a citation to another publisher's journal presents the name differently -- someone must realize that these name variants refer to the same person.)
Ouch! The decisions involved in properly including these kinds of names in my database are painful. I fear that there is no "right" way.