Names reform: request for comments
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
We don't have a mechanism in CSL for declaring that an apparent particle should be treated as a fixed part of the last name.
And what does citeproc-js do when the space in "De Quincy" is replaced by a non-breaking space?
It's not just "De Quincy", by the way: CMoS 16e, 16.71, also recommends to always use “de Gaulle”, and sort him under "D".
I'm not sure what would happen with a non-breaking space, but it's likely that would do the trick.
There are plenty of exceptions that would require an override. The key thing is to get agreement on the override syntax in the CSL group - otherwise we don't have a basis for consistent markup across implementations.
I'd like to return to this issue since, just like aurimas, I feel the current implementation is markedly problematic:
Hence I’d like to repeat aurimas’ and my suggestion:
My favourite is what I see as the cleanest solution: introducing additional fields for dropping-particle, non-dropping-particle and suffix, by adding a "five-field" state to the name field (in addition to the existing "single field" and "two-field" states).
An in-field markup syntax would also be possible, e.g., by using something like the pipe character ("|") to separate subfields inside lastname and firstname fields.
Unfortunately, I cannot code any of these but I’d certainly be willing to contribute to testing.
All someone needs to do is convert user data into that form, and prepare code to interface with the processor on that basis. The coding on the processor side will be trivial.
So let me ask a few more questions on that:
(1) Could we document the available information on name particles in general, and the algorithm used by citeproc-js a little more clearly – also with a view on how other citeprocs could implement this?
Most importantly, could we put together a list of particles we can unambiguously identify as either dropping or non-dropping? And if there are criteria other than simple membership in one of these lists, what are these?
Or, amounting to the same thing essentially, how should the list in https://bitbucket.org/fbennett/citeproc-js/src/tip/src/util_name_particles.js?at=default be understood, especially, which particles on this list are parsed as dropping or non-dropping, what exactly do the numbers mean, and are there any other criteria applied by the algorithm?
(I’ll note that while “van der” seems to be parsed correctly as a non-dropping particle by citeproc-js, “’s-” always seems to behave like part of the family name, and both “al” and “al-” [the latter not on the list] always like a dropping particle.)
(2) Could we also clarify what post-processing is required, resp., actually applied by citeproc-js? For example, name parts are usually separated by spaces, but the space following particles ending with an apostrophe (“d’”) or a hyphen (“al-”) should be removed when followed by another name part – with one apparent exception, “de’”, as in “Lorenzo de’ Medici”. Is anyone aware of other such exceptions?
(3) Is there any simple way to find out about citeproc-js’s parsing decisions, other than inferring them from the output in a Word/LO file?
Thanks for flagging the anomalies. I'll look into those.