Any idea why an "A" author comes last in the bibliography

  • edited July 19, 2015
    @nickbart: I've started looking this over, and I seem to be receiving conflicting guidance on the treatment of the Arabic particles.

    The CSL Chicago styles have the following setting in the cs:style node:
    Under the CSL specification, that means that a non-dropping particle should be included as part of the surname when sorting names (scroll down to the subheading "Sort order A: non-dropping-particle not demoted" for an example).

    Setting the Arabic particles as non-dropping (in the CSL sense) would place all "al-" prefixed names under "A" in the bibliography, in the standard CMOS style. There seems to be general agreement that that would be wrong.

    Arabic particles could be classified as non-dropping with a cs:style attribute of:
    This would sort Arabic names correctly, but the sorting of European names with non-dropping particles would then be broken under CMOS rules. So that won't work either.

    When Arabic names are set as dropping (as at present), they are always demoted in the bibliography (which appears to be correct, or at least acceptable, under CMOS); but as you and joehill have pointed out, that also drops the particle from the surname when it appears in citations, which is not acceptable.

    So at present, there appears to be no setting in CSL that will produce the desired behaviour without breaking things somewhere. I think that's telling us that we have a third category of particle on our hands. That is a CSL issue in the first instance. Once the handling of these particles is specified in CSL, processors will be able to conform to the spec.

    The new parsing and classification mechanism in citeproc-js may be in the wrong place, the code behind it may be too opaque, and it may need better documentation, but it is nonetheless our friend, because it will permit immediate support of a third category of particle without the need for special markup or elaborate UI support. But before we can adapt it to solve this problem, the proper handling of Arabic name particles and the parameters for discretionary adjustments, if any, will need to be specified in CSL.
  • edited July 19, 2015
    Meanwhile, thanks for providing those particles. I have added them to the parser struct—and there was indeed a flaw in the "al-" entry, and in the entry for "von" as well. When entered at the front of the surname field, "al-" and the other particles you have listed should come out as standard non-dropping particles now. Mea culpa.

    You can trial the revised processor with either of the Propachi plugins.

    (It does still sound like the Arabic name particles require slightly different handling than those on European names. If there is a clear picture of the requirements, and if they are at odds with other formatting requirements, it would be worth raising the issue on the CSL list [xbiblio-devel]).
  • @Gracile: Thanks a lot, it worked when I changed it to display-and-sort (not through sort-only however)! Now "Van der Veer" is alphabetized under "Veer, Peter van der". I am MUCH more please with how my bibliography came out now. The only problems remain with al-, but these are more minor than the "van der"-problem.

    @fbennett: thanks for all your efforts into this! For me, "al-" is a particle which can be treated the same as the Dutch "van" or "van der" with regard to capitalization, non-droppingness and alphabetization - but I am not a coding specialist, so I might not be aware of the intricacies behind it, nor am I aware of all the linguistic intricacies of other European languages and their name particles. I don't know if it's useful, but I'll illustrate my point of them being eligible to equal treatment through an example:

    Example (all Chicago) of citations with names involving particles "van der" and "al-":
    1) as a full note (no capitalization of particle and currently correct in Zotero Chicago Style)
    - Peter van der Veer, Imperial Encounters: Religion and Modernity in India and Britain (Princeton: Princeton University Press, 2001).
    - Aziz al-Azmeh, Islams and Modernities, 3rd ed. (1991; repr., London: Verso, 1999).
    2) shortened citation (both non-droppingness + capitalization, but currently inconsistent)
    - Van der Veer, Imperial Encounters. [currently: van der Veer, Imperial Encounters, 160 (which is also acceptable to me)]
    - Al-Azmeh, Islams and Modernities [currently: Azmeh, Islams and Modernities (which is kind of weird, al- should never drop completely)]
    3) bibliography
    - Veer, Peter van der, Imperial Encounters: Religion and Modernity in India and Britain (Princeton: Princeton University Press, 2001). (which I changed through editing the style, as Gracile suggested)
    - Azmeh, Aziz al-, Islams and Modernities, 3rd ed. (1991; repr., London: Verso, 1999).
  • edited July 20, 2015
    akateman: This is helpful, maybe my uneasiness about our model was premature.

    Try installing the Processor Patch plugin, and see what results you get (you may need to switch away from your style, then back for the plugin to take effect). With the plugin installed, and "al-" and its friends set at the start of the surname field, they will be treated as CSL "non-dropping".

    I tested a style with demote-non-dropping-particle="display-and-sort", and it seems to produce the effect that you and other Arabic contributors are signalling: the particle is retained in author-date forms for citations, and demoted to a position after the given name when the name-part order is reversed in the bibliography—but the results I see are not the ones that matter.
  • @fbennett: thanks! It worked! Now, in shortened citations, it gives "al-Azmeh" as the author (though not capitalized, but that's really a minor problem) AND it's still alphabetized correctly in the bibliography. Also, my "van der"-citations are also still correctly handled.

    For you last comment, I don't use author-date forms (but shortened citations, or short notes), so I don't know about that. The demotion of the particle in the bibliography is fine as it is according to the Chicago Manual of Style, although some arabists might disagree (but there'll always be disagreement, I guess...).

    Thanks again and a lot!
  • edited July 20, 2015
    Further notes:

    The total dropping of "al-" in short-form names happens because, as nickbart says, the particle is set as "dropping" in the old processor release.

    If the new processor produces satisfactory results, the next question will be whether "non-dropping" behaviour should be forced for this category of particles. If "forced," they would be treated as non-dropping even if entered after the given name in the Zotero entry:

    first field: Azmeh
    second field: Aziz al-

    In the current "Processor Patch" version, the form above would be treated as "dropping," but if that is never desired for these particles, it could be made to work as "non-dropping" in this form as well.
  • akateman: Good to see your post above, this sounds like progress.

    So there is still a possibility that Arabists might prefer this category of particles to be treated as a third category. That can be done, but it would need to be agreed on the CSL list (xbiblio-devel).
  • (The non-capitalization conforms to the CSL specification. If it is an annoyance, that would also be an issue for the CSL list—I just work here. :)
  • @fbennett: to me it seems that this model (depending on whether you enter it in the first or in the second field) allows a user-friendly flexibility. (Although it would be quite a hidden feature of flexibility, so then there might have to go effort into making the two options explicit and public)

    I'll move my "al-"s to the first field, but that doesn't mean that others would necessarily all want the same result or that publishers have the same rules for it.
  • @fbennett: yes, there might be disagreement among arabists about
    a) whether they want "al-" to drop in shortened citations (not as likely to be agreed on, I guess) [the same applies to author-date form, I guess]
    b) whether then want "al-" to be demoted in bibliographies (more likely to be disagreed on, I guess) [but in any case its demotion or its non-demotion should not have an impact upon its alphabetization]

    @fbennett: and thanks for you explanation that the non-capitalization conforms to CSL. I'll just accept it then!
  • edited July 20, 2015
    Okay, I'll leave things as they are.

    For documentation (to respond to nickbart, above) I think the thing to do will be to construct a dynamic page that accepts a list of particles or names, and returns a detailed explanation of how those specific items are treated in CSL. Somehow I don't think a technical doc that explains the parsing rules would attract much readership.

    I don't have time to work on it now, and probably won't until early next year, but (as a note and a nudge to folks out there with coding skills and an interest in these things) the processor code to copy into such a page is here. (the critical bits are the createCategorizer() function, and the anonymous return function at the end)
  • It would be very helpful to have a discussion of Arabic name issues that involves the CSL designers. There do seem to be some special requirements here, and now that we are classifying particles explicitly, there is a good prospect of getting options in place that will satisfy everyone, if there is a willingness to invest a bit of time in discussion.
  • @fbennett, remind me, can citeproc-js spit out self-parsed CSL JSON? (i.e. where two-field names have been chopped up in the individual name-parts)

    I tried following the discussion, but it's still not entirely clear to me whether current CSL can properly cover the Arabic names. Is it correct that a name like "Tawfiq al-Hakim" should only ever appear as either "Tawfiq al-Hakim", "al-Hakim, Tawfiq", or "al-Hakim", and sort under "H"? And that we're fine with regard to prefixes like "Abu-" as long as we just treat these as part of the family name, and not as a particle at all?
  • edited July 20, 2015
    Re getting at the name parts, there is no option for it, but it could be set up to dump the serialized JSON into a log file.

    The story so far seems to be that opinions differ. These two are apparently both seen in name-as-sort-order entries (with the first two having some strong adherents maybe? and the last appearing in CMOS examples):

    Al-Hakim, Tawfiq
    al-Hakim, Tawfiq
    Hakim, Tawfiq al-

    The guidance seems to be that names prefixed with al- and its friends must always sort under the root (so under "H" for this name). So if they are non-dropping particles, demote-non-dropping-particle="never" yields a bad sort.

    In the short form, this category of particle should never be dropped, ever (which is the current behaviour for non-dropping particles with all three settings):

    (al-Hakim, 1929)

    Treating Abu- as a non-particle seems to be correct.

    So we're close. The two main issues appear to be:

    • the sort behaviour with demote-non-dropping-particle="never"; and

    • potential objections to demoting the particle when demote-non-dropping-particle="display-and-sort" is applied for European name formatting.

  • Frank, I assume citeproc has a function that takes a csl JSON author and parses the name into individual parts. Could you point me to that function? Also, could you put the automatic name parsing behind a flag that Zotero could toggle? I'll move the parsing into csl JSON generator in Zotero, so it's exported via translator.
  • edited July 20, 2015
    To some extent, this is a nearly insurmountable problem in that we are at the mercy of journal editors and publishers and of database curators. The metadata provided will have many different representation of he same authors name -- with and without hyphen; differences in casing; sometimes the particle precedes the last name other times it follows the first name with or without a separating comma.

    When there can be so many different representations of the same author's name in the Zotero library, how is one to recognize that the authors are the same person and that the owner of the Zotero database needs to be edit the names to be consistent? (The database owner is typically not an expert in conventions of name indexing but in the professional discipline topic at hand.)

    If Zotero can help identify same-author name variants and further suggest best to consolidate the names into a standard, that would be wonderful. Perhaps, this could be an add-on utility that could be run when needed. If the standard Zotero configuration makes name configuration adjustments to conform to a citation style, it is all the more important that author names in the Zotero database conform to a standard. Else, attempts to help with name disambiguation could have the opposite effect.

    This brings forward another issue. Should the author's name be presented exactly as it appears on a publication or should corrections be made to make the names conform to a standard (say, CMoS 14.72). What about times that an author's name had been incorrectly printed in a publication? Sometimes this is corrected via a notice in a later issue of the journal. Sometimes (with electronic publishing) this is retroactively changed in the original pdf version, sometimes not. Sometimes changes are made to publisher-supplied metadata, sometimes not.

    A related issue is spelling of author names that have been changed to avoid extended ascii characters -- German, Scandinavian, and other language names with umlauts (sometimes the umlauts are simply and incorrectly ignored, other times a 2-letter form is used). French, Portuguese, Spanish names with simple accents may appear with or without the decoration. Attempts at name standards (ORCID, VIAF) are of minimal help, at best. I've encountered several authors with two or more ORCIS IDs. It is common for authors to have several VIAF identifiers with the same and different spellings. If Zotero can help with this, it would be astoundingly wonderful.

    As curator of SafetyLit, a multidisciplinary bibliographic database, I struggle with this daily.
  • @aurimas: Opportunities like this don't come along every day.

    Here you go.
  • edited July 20, 2015
    The processor option to disable name parsing is:
    citeproc.opt.development_extensions.parse_names = false;
  • Thanks! Any hint as to what normalizeApostrophe does? When do we want to use that?
  • edited July 20, 2015
    It looks extraneous, and I can't imagine a case where it would be useful as coded. An option to disable the second invocation of apostropheNormalizer() to get monospaced plain text output would make sense a little, although there's probably not much need for it.
  • Wait, that was backwards. As coded, you would always want to set normalizeApostrophe to "true," of course.

    The first invocation of apostropheNormalizer() sets apostrophes to the straight-single-quote form for parsing, and the second forces them to right-single-quote for rendering. (The end result is that you get consistent output even if the name as entered has a mixture of the two encodings for apostrophe.)
  • @fbennett: thank you for the “al-” fixes and all your helpful comments.

    I continue to think that the CSL schema in its current form is sufficient, at least for European and Arabic names. To recap:

    Certain names start with non-dropping particles, where “non-dropping” means these particles have to appear in in-text citations (“van den Keere”, “al-Hakim”) but may or may not be dropped in a bibliography for sorting (“al-Hakim, Tawfiq” [sort under “H”], “van den Keere, Pieter” [sort under “K”]), or sorting and display (“Hakim, Tawfiq al-”, “Keere, Pieter van den”).

    The Chicago Manual clearly recommends the sort-and-display variant (16e: 8.10, 8.14, 16.71, 16.76); that’s why I would argue that all CSL Chicago styles should switch to `demote-non-dropping-particle="display-and-sort"`.

    By contrast, any last name that does not function this way, i.e., where elements are never removed from the front for purposes of sorting or display, or in other words, where the last name is always used in one and the same form only throughout a document, both in text and in a bibliography, should be parsed as one multipart last name.

    For example, I would argue that “La Fontaine” should be understood, contra the examples given in, as one single multipart last name, since “Fontaine” never seems to be used alone, neither for sorting nor display (I’ve sometimes seen “Fontaine” used as a crossreference pointing to “La Fontaine”, but that’s nothing currently implemented in CSL anyway).

    Parsing such “immutable” last names as multipart last names will most likely take care of all “potential objections to demoting the particle when demote-non-dropping-particle="display-and-sort" is applied for European name formatting” you referred to earlier in this thread.

    If this seems acceptable so far, it would also mean that some of citeproc-js’s parsing rules need to be reviewed, e.g., the one on “La”. Protecting such names by wrapping them in double quotation marks would serve as a workaround, of course.

    On the other hand, if a genuine need is felt to have more flexibility, e.g., allowing different settings for demoting various individual groups of non-dropping-particles (e.g., “al-” vs. “van den” vs. “La”) we’d have to discuss an extension of the CSL schema – but currently I don’t really think that’s necessary.

    @aurimas: “I'll move the parsing into csl JSON generator in Zotero, so it's exported via translator.” – That’s fantastic, for overall clarity, for easier debugging, and in particular for using Zotero with other citeprocs such as pandoc. Thank you.
  • @nickbart: If you would the classification of the "La" particle to be changed, please raise the issue on the CSL list (xbiblio-devel), so that other project maintainers can participate in the discussion.
  • edited July 24, 2015
    I tried to figure out what the code next to the various particles could possibly mean. This is what I came up with – is this anywhere near correct?

    If it is, we could use this, minus the code, as a basis for documentation.
    ["al-", [[[0,1], null],[null,[0,1]]]], –  NDP / DP 2

    ["'s-", [[[0,1], null]]], – DP
    ["'t", [[[0,1], null]]], – DP
    ["af", [[[0,1], null]]], – DP
    ["al", [[[0,1], null]]], – DP
    ["auf den", [[[0,2], null]]], – DP
    ["auf der", [[[0,1], null]]], – DP
    ["aus der", [[[0,1], null]]], – DP
    ["aus'm", [[null, [0,1]]]], – NDP
    ["ben", [[null, [0,1]]]], – NDP
    ["bin", [[null, [0,1]]]], – NDP
    ["d'", [[[0,1], null]],[[null,[0,1]]]], – NDP / DP
    ["da", [[null, [0,1]]]], – NDP
    ["dall'", [[null, [0,1]]]], – NDP
    ["das", [[[0,1], null]]], – DP
    ["de", [[null, [0,1]],[[0,1], null]]], – NDP / DP 3
    ["de la", [[[0,1], [1,2]]]], – split into DP + NDP
    ["de las", [[[0,1], [1,2]]]], – split into DP + NDP
    ["de li", [[[0,1], null]]], – DP
    ["de'", [[[0,1], null]]], – DP
    ["degli", [[[0,1], null]]], – DP
    ["dei", [[[0,1], null]]], – DP
    ["del", [[null, [0,1]]]], – NDP
    ["dela", [[[0,1], null]]], – DP
    ["della", [[[0,1], null]]], – DP
    ["dello", [[[0,1], null]]], – DP
    ["den", [[[0,1], null]]], – DP
    ["der", [[[0,1], null]]], – DP
    ["des", [[null, [0,1]],[[0,1], null]]], – NDP / DP 3
    ["di", [[null, [0,1]]]], – NDP
    ["do", [[null, [0,1]]]], – NDP
    ["dos", [[[0,1], null]]], – DP
    ["du", [[[0,1], null]]], – DP
    ["el", [[[0,1], null]]], – DP
    ["il", [[[0,1], null]]], – DP
    ["in 't", [[[0,2], null]]], – DP
    ["in de", [[[0,2], null]]], – DP
    ["in der", [[[0,1], null]]], – DP
    ["in het", [[[0,2], null]]], – DP
    ["lo", [[[0,1], null]]], – DP
    ["les", [[[0,1], null]]], – DP
    ["l'", [[null, [0,1]]]], – NDP
    ["la", [[null, [0,1]]]], – NDP
    ["le", [[null, [0,1]]]], – NDP
    ["lou", [[null, [0,1]]]], – NDP
    ["mac", [[null, [0,1]]]], – NDP
    ["op de", [[[0,2], null]]], – DP
    ["pietro", [[null, [0,1]]]], – NDP
    ["saint", [[null, [0,1]]]], – NDP
    ["sainte", [[null, [0,1]]]], – NDP
    ["sen", [[[0,1], null]]], – DP
    ["st.", [[null, [0,1]]]], – NDP
    ["ste.", [[null, [0,1]]]], – NDP
    ["te", [[[0,1], null]]], – DP
    ["ten", [[[0,1], null]]], – DP
    ["ter", [[[0,1], null]]], – DP
    ["uit de", [[[0,2], null]]], – DP
    ["uit den", [[[0,2], null]]], – DP
    ["v.d.", [[null, [0,1]]]], – NDP
    ["van", [[null, [0,1]]]], – NDP
    ["van de", [[null, [0,2]]]], – NDP
    ["van den", [[null, [0,2]]]], – NDP
    ["van der", [[null, [0,2]]]], – NDP
    ["van het", [[null, [0,2]]]], – NDP
    ["vander", [[null, [0,1]]]], – NDP
    ["vd", [[null, [0,1]]]], – NDP
    ["ver", [[null, [0,1]]]], – NDP
    ["von", [[[0,1], null],[null,[0,1]]]], – NDP / DP 2
    ["von der", [[[0,2], null]]], – DP
    ["von dem", [[[0,2], null]]], – DP
    ["von und zu", [[[0,1], null]]], – DP
    ["von zu", [[[0,2], null]]], – DP
    ["v.", [[[0,1], null]]], – DP
    ["v", [[[0,1], null]]], – DP
    ["vom", [[[0,1], null]]], – DP
    ["vom und zum", [[[0,1], null]]], – DP
    ["z", [[[0,1], null]]], – DP
    ["ze", [[[0,1], null]]], – DP
    ["zum", [[[0,1], null]]], – DP
    ["zur", [[[0,1], null]]] – DP
    The meaning of my shorthands “DP”, “NDP”, “split into DP + NDP” should be obvious; “NDP / DP” (several variants?) = parse as NDP if found in the last-name field, parse as DP if found in the first-name field.

    A few apparent bugs, all seen with Zotero, Propachi: monkey-patch for Zotero CSL processor (standard version) 1.1.11, and a patched chicago-author-date.csl with “demote-non-dropping-particle="display-and-sort"”:

    (1) The algorithm seems to be case-insensitive, but at least “Van” with a capital “V” would usually indicate a Belgian or American last name where “Van” should not be parsed as a particle but as part of a multi-part last name. This assumes of course that all Dutch “van”s are entered in lower case.

    (2) “ter” is parsed as a dropping-particle but is clearly a non-dropping-particle (CMOS 16e 8.10). I would guess that “ten”, “uit de”, “uit den”, “in 't”, “in de”, “in der”, “in het”, “'s-”, “'t”, “op de” should be non-dropping, too.

    (3) “von und zu Author” / “Al” in Zotero’s two-part name field is rendered as:

    “von und zu Author 2015” [in-text] and

    “und zu Author, Al von. 2015. …” in the bibliography.

    This should be “Author, Al von und zu. 2015. …”

    Same with “vom und zum”.

    (4) “da” should be “NDP / DP”: CMOS 16e 8.8 has

    “Agostinho da Silva; Silva” but “Vasco da Gama; da Gama”

    EDIT: … or possibly “MPLN / DP”, i.e. “parse as multi-part last name (= leave it as is) if found in the last-name field, parse as DP if found in the first-name field”.
  • Per my example on the xbiblio-devel list ("Beethoven, Ludwig van" and "van Gogh, Vincent"), "van" should be DP/NDP.

    Regarding (2), “ten”, “uit de”, “uit den”, “in 't”, “in de”, “in der”, “in het”, “'s-”, “'t”, and “op de” are all common Dutch particles (probably exclusively, too), and as such, non-dropping.
  • @nickbart: That's it. The first-listed pair is the default, so single-entry particles are forced to that form, regardless of whether they are entered in the family or the given field. Classification of particles with two (or more) entries can be controlled by entering them to match a subordinate spec's expectations.

    The entries are indeed non-case-sensitive. Pull requests welcome - I can move this module to GitHub with direct write privileges if that will make workflows easier.
  • @fbennett: Thank you, but I’m not sure I understand this well enough yet to start patching your code, so a few more questions instead:

    Why are there different NDP / DP variants?

    What do the numbers mean – do these stand for the number of words a particle consists of? Should “von und zu” and “vom und zum” have a “3” then?

    What would be the code for “MPLN / DP”?

    Could you consider adding support for case-sensitive parsing?
  • edited July 26, 2015
    Why are there different NDP / DP variants?
    Where there are multiple entries, the classification of the particle depends on its location (front of the family field, end of the given field). I don't actually know if that is ever desirable, but that's what it does. Where a spec covers all possibilities (i.e. two entries for a one-part particle), the variants are logically equivalent. For e.g.:
    [[0,1],null], [null,[0,1]]
    [null,[0,1]], [[0,1],null]
    It would make sense to normalize those to a single form. (It would also make sense to abstract away all those nested arrays, and replace them with named constants for readability…)
    What do the numbers mean – do these stand for the number of words a particle consists of? Should “von und zu” and “vom und zum” have a “3” then?
    Yes, that's it. Those should be 0,3.
    What would be the code for “MPLN / DP”?
    I think you could express it with something like [null,null], [null,[0,1]]. I don't know how the existing code would react to that, but it could be adjusted.
    (This would need an extension to the categories. We currently have "dropping" and "non-dropping". To that we might want to add "part-of-name" and possibly [but not certainly] "never-dropping" [for Arabic name particles]).
    Could you consider adding support for case-sensitive parsing?
    I'll wait for other feedback before moving on it, but it sounds like that would be a good idea.
    (We would need to step carefully on this, though. If case sensitivity were turned on with the current code, every possible combination of upper- and lower-case would need to be specified, which might make things a little hard to manage.)

    EDIT: Strikeout and text in parens added.
  • This may be a rather ignorant question, but are there any cases where an uppercased family name element is actually a particle? (e.g. we just concluded that "La" in "Jean de La Fontaine" is never demoted, so we don't need to treat it as a particle, right? I assume the same goes for the Americanized "Van")
  • edited July 26, 2015
    Depending upon the publisher or database there can be many representations of the particles in an author's name. This becomes an even greater problem with compound names. Different casing and hyphenation patterns further complicate the issue.

    For example:

    Dios, J.R.M.-d.

    Martínez-de Dios, J. Ramiro

    Martínez-de-Dios, J. R.

    Martínez-De-Dios, J. R.

    Ramiro Martínez-de Dios, José

    Martinez-de-Dios, J.R.

    Martinez-de Dios, J.R.

    Ramiro-Martínez-de Dios, José
    In print, not online (edit)

    de Dios, José R.M.
    In print, not online (edit)

    Anyone who is not familiar with his work citing almost any two of the above names might not realize that each of the names above represent the same person. Is it beyond the reach of Zotero to offer a hint that there may a need to seek more information useful to disambiguate or to merge these authors? I cannot imagine a way for a translator to parse these examples in a way that can indicate they are the same author.

    There is also the issue of the proper way to cite articles by this author. Should his name be standardized or cited as-is? I typically have seen his work cited with his name standardized -- however, differently by different publishers. (edit: Elsevier cites him the same way even when the name form from a citation to another publisher's journal presents the name differently -- someone must realize that these name variants refer to the same person.)

    Ouch! The decisions involved in properly including these kinds of names in my database are painful. I fear that there is no "right" way.
Sign In or Register to comment.