@nickbart, a second (and maybe better) criterion to distinguish non-dropping particles from non-particles is probably whether or not family names always include these name elements in alphabetical sorting. If they are sometimes ignored (e.g. with "de Koning" needing to be sorted under "K"), they're particles.
Like @nickbart, I disagree with "Given correct data entry, the dropping particle has no significance for CSL or Zotero, so we can ignore parsing that part altogether.". Dropping particles are treated differently from initials or non-dropping particles, so you need to treat them as their own class of name element.
As for "The non-dropping particle ... may be joined with family name by punctuation", I would just like to add that the punctuation options are probably rather limited in (Western) names. I think spaces, apostrophes/quote marks, and hyphens cover most cases.
Then with regard to "Current suggestion is for some ... solution that cycles through possible permutations" and "User must know what he's looking for anyway, so I don't really see why that is easier than just correcting the case manually.". The main issue I have with the current setup is that users need to be explicitly aware of the existence of dropping and non-dropping particles, as well as the precise formatting requirements in Zotero to get correct output. This all without the Zotero UI giving any guidance or feedback when editing the name field.
Instead of a UI option that cycles through the different two-field name element storage options, like Frank's example:
van der Merwe, Wikus Van der Merwe, Wikus Van Der Merwe, Wikus "van der Merwe", Wikus der Merwe, Wikus van Der Merwe, Wikus van "der Merwe", Wikus van Merwe, Wikus van der van der Merwe, Wikus
I would like to propose two possible changes. First, I think the user should be confronted with all options simultaneously. I think cycling through this many options is just confusing. Even looking at the entire list right now it takes me a long time to figure out what's what. Second, I think the UI should focus on the desired output instead of the way the name needs to be stored in Zotero. E.g. the list could also be presented as a menu with the following options:
"(van der Merwe)" - "Merwe, W. van der" "(Van der Merwe)" - "Van der Merwe, W." "(Van Der Merwe)" - "Van Der Merwe, W." "(van der Merwe)" - "van der Merwe, W." "(der Merwe)" - "Merwe, W. van der" "(Der Merwe)" - "Der Merwe, W. van" "(der Merwe)" - "der Merwe, W. van" "(Merwe)" - "Merwe, W. van der"
That seems much more intuitive to me. Zotero can then easily rearrange the name elements as required.
Finally, regarding "Correctly parsing/splitting names on import", I generally agree. In my own experience there are two annoying things. First, capitalization on import is often wrong. I deal with a fair number of Dutch names, and often particles come into Zotero uppercased. Second, sometimes non-dropping particles come in as dropping particles (that is, they reside in the given name field). It would be nice if Zotero had an easier way to move particles between the two name fields. Currently it's quite an ordeal: activate name field A, select particle, cut particle, activate name field B, select insertion point, paste particle.
Like nickbart and Rintze, I disagree with "Given correct data entry, the dropping particle has no significance for CSL or Zotero, so we can ignore parsing that part altogether." To their arguments, I'd add that it would be nice if CSL could, *in the future*, format the dropping particle, at least to add parenthesis. To take a (now) well-known example: "La Fontaine, Jean (de)" and even "La Fontaine (de), Jean" is sometimes the desired output.
I didn't understand the purpose of the keyboard permutations of a name with particles at first, but I'm now convinced: it will make the mechanism of the particles parsing discoverable and clear to the user, especially with Rintze's proposals above which I second!
Two Four apparent processor bugs (observed with 1.1.19):
When there’s a suffix, a preceding (dropping) particle is not parsed at all: [Author] [Ann de, III] is parsed as"family": "Author", "given": "Ann de", "suffix": "III"
Parsing is not exhaustive:[van Author] [Ann von] is parsed as"family": "Author", "given": "Ann von", "non-dropping-particle": "van"
EDIT: And[de l’Author] [Ann] is parsed as"family": "l’Author", "given": "Ann", "non-dropping-particle": "de"(though de l’ is on the list),
EDIT 2: And[vom und zum Author] [Ann] is parsed as"family": "und zum Author", "given": "Ann", "non-dropping-particle": "vom"(though vom und zum is on the list).
(1) is a bug. (2) "von van" is not a listed particle, so "van" as NDP is expected. (3) is a bug. (4) is a bug.
I'll take a look, although the current parser will have a limited life expectancy. In the parser for the UI code I'm building with the same particles data, (1) fails, and (2)-(3) pass.
(Edit: Actually, I think I can just adapt the UI parsing code to do the classification. The new code is much more transparent, and since there may be a role for the classifier in translators, refactoring it will be a good use of time.)
I have proof-of-concept code for name particle UI support running in a trial build of Zotero. To save the trouble of installing the client build for testing, I've made a short screencast that takes it through its paces.
Please click and view. I'm curious to know how it will be received.
Sort-order for the full name would be very clear with aligned columns, but with jagged layout it seems a little cluttered, at least for someone encountering the issue for the first time.
Would a decoration hint with normal ordering work to distinguish the parts? Something like highlights, or mild boldface, or italics?
@fbennett, in my example above I chose to not show the actual two-field content in the menu. I think users can quite easily observe how the name formatting has changed after selecting the desired display format, and adding the information to the menu is IMHO just confusing, since the user now has to figure out what represents the data entry options and what represents the corresponding rendering options.
Instead, I would show just the family name by itself (`form="short"`), and separately, the full name (`form="long"` and `demote-non-dropping-particle="never"`) with `name-as-sort-order` active. Those two examples would very clearly show which name elements are recognized as particles, which particles are recognized as non-dropping and dropping, and how particles are capitalized.
@Rintze: Removing the quotes, in other words. Got it, and that helps the clutter. Formatting apparently isn't possible in a simple XUL menu, so highlighting and whatnot (which would probably have made things worse anyway) is out. I'll take another shot and refresh the screencast.
@Gracile: I think we'll show the option disabled, but you're right that it shouldn't do anything if there is nothing to do.
Do you think the menu would be easier to read if the options are alphabetically sorted?
(I think you meant `demote-non-dropping-particle="display-and-sort"` above.)
I was on the fence (not demoting looks more natural to my Dutch eyes), but yeah, demoting the non-dropping particles provides more information since it shows the distinction between non-particles and non-dropping particles.
Maybe "Particler" is a little too casual?
Similar to the "Transform Text" option in the title menu, I would go for an action description, e.g. "Adjust Particles".
I noticed that "uit den" cannot be changed into dropping particles. I take it that this is because these are always non-dropping in your list.
Yes, that's right; the idea is to restrict the options to those most likely to be meaningful. Is that a correct spec for "uit den" and the other pure Dutch particles?
We may be able to slim down the number of options presented with unspec'd particles - we should only present things likely to be possible, the user can edit manually for rare combinations. We should also put a ceiling on the number of options or the number of particles somehow, to prevent mischievous people from DoS'ing the UI.
I don't really understand those headers. E.g. "(Al-Pitkin) <> Al-Pitkin, Lemuel dos" is listed under "Fixed surname", but "dos" is a dropping particle here, right?
If the option is selected, the field content becomes:Al-Pitkin, Lemuel dosSo the headings describe what is happening with the surname—not sure if that's best, but that's what it's doing. It's actually reporting processor semantics, isn't it. I suppose this one should be dropping-particle.
Headings (maybe with some further refinement) seem like progress, but what do you think?
Ah, okay. I get it now. Not sure it's clear enough, though.
How about forgoing headers, and only alphabetizing the list (is that really problematic?). That would change
(dos al-Pitkin) <> Pitkin, Lemuel dos al- (dos Al-Pitkin) <> Al-Pitkin, Lemuel dos (al-Pitkin) <> Pitkin, Lemuel dos al- (Pitkin) <> Pitkin, Lemuel dos al- (Dos al-Pitkin) <> Dos al-Pitkin, Lemuel (Dos Al-Pitkin) <> Dos Al-Pitkin, Lemuel (Al-Pitkin) <> Al-Pitkin, Lemuel dos (dos al-Pitkin) <> dos al-Pitkin, Lemuel (dos Al-Pitkin) <> dos Al-Pitkin, Lemuel (al-Pitkin) <> al-Pitkin, Lemuel dos
to
(Al-Pitkin) <> Al-Pitkin, Lemuel dos (al-Pitkin) <> al-Pitkin, Lemuel dos (al-Pitkin) <> Pitkin, Lemuel dos al- (dos Al-Pitkin) <> Al-Pitkin, Lemuel dos (Dos al-Pitkin) <> Dos al-Pitkin, Lemuel (Dos Al-Pitkin) <> Dos Al-Pitkin, Lemuel (dos al-Pitkin) <> dos al-Pitkin, Lemuel (dos Al-Pitkin) <> dos Al-Pitkin, Lemuel (dos al-Pitkin) <> Pitkin, Lemuel dos al- (Pitkin) <> Pitkin, Lemuel dos al-
The latter seems much more readable to me.
Alternatively, maybe you could offer dedicated particle capitalizing and particle type menus? That would reduce the options of the particle type menu to:
(al-Pitkin) <> al-Pitkin, Lemuel dos (al-Pitkin) <> Pitkin, Lemuel dos al- (dos al-Pitkin) <> dos al-Pitkin, Lemuel (dos al-Pitkin) <> Pitkin, Lemuel dos al- (Pitkin) <> Pitkin, Lemuel dos al-
(I see that the menu currently excludes the option that is identical to how the name is already formatted?)
The separate particle capitalizing menu would then also have very few options.
I've fixed up the header logic, here's a refreshed screencast.
An undecorated list would be more readable in alphabetical order, but I think there may be value in the headers. They ease the user into the terminology that we use for the different forms, and help to tie the CSL documentation to what the user sees in the UI. When they are removed, the user is on their own to figure out what all those options mean.
(Editing for completeness)
The current form is not excluded from the list - it's just shown with the non-dropping particle demoted.
The list only gets big when the name contains unspecified particles (multiple terms in lower-case in a particle position). It seems like that would be uncommon (apart from typos). I'm not sure the added complexity in the UI would be worth it.
I was on the fence (not demoting looks more natural to my Dutch eyes), but yeah, demoting the non-dropping particles provides more information since it shows the distinction between non-particles and non-dropping particles.
Same here.
Edit: Just to clarify the "columns" represent: in-text citation ⬄ bibliography , right?
Yes, it's as Rintze described above—set as if in a citation with form="short" on the left, and as if in a bibliography with form="long", demote-non-dropping-particle="display-and-sort", and name-as-sort-order="true". on the right.
I haven't seen Dan's take on this, but in my opinion, the list is too complicated to figure out and the discoverability of the list is too difficult as well. IMO the latter part should be addressed by displaying dropping and non-dropping particles in the Zotero pane with some special decoration and we can address that later (baby steps).
As for the list, we need to simplify it more. I think it's ok to be missing some obscure cases and have the users ask us how to enter those in, rather than make everyone confused with all the options. Since we're also concerned about improper capitalization, we can assume that any words in the name that match known particles should be lower-cased and proceed from there. Going with parsing a name entered as "Al-Pitkin, Lemuel Dos", I think we should offer the following transform options Pitkin, Lemuel (dos al-) // Treat all particles as dropping Pitkin, Lemuel (dos) al- // Shift each of the particles (left-to-right) into non-dropping mode Pitkin, Lemuel dos al- //... Al-Pitkin, Lemuel (dos) // transform each particle (left-to-right) into non-particle and repeat above Al-Pitkin, Lemuel dos //... Dos Al-Pitkin, Lemuel //...
Now, the obscure thing that remains is that (...) is a dropping particle. Maybe we can figure out a better way to display this.
Based on the previous discussions, capitalized particles are treated as non-particles and lower-case as non-dropping, so I don't think there's much sense in displaying the transforms that break this rule.
Thoughts?
Edit: maybe add an additional comma between given name/dropping particle and the non-dropping particle (and underline/bold/something it?). We can then drop the parentheses.
Pitkin, Lemuel dos al- // Treat all particles as dropping Pitkin, Lemuel dos, _al-_ // Shift each of the particles (left-to-right) into non-dropping mode Pitkin, Lemuel, _dos al-_ //... Al-Pitkin, Lemuel dos // transform each particle (left-to-right) into non-particle and repeat above Al-Pitkin, Lemuel, _dos_ //... Dos Al-Pitkin, Lemuel //...
Yeah, I agree that there are too many options currently, and that the options are difficult to understand.
Based on the previous discussions, capitalized particles are treated as non-particles and lower-case as non-dropping, so I don't think there's much sense in displaying the transforms that break this rule.
Like @nickbart, I disagree with "Given correct data entry, the dropping particle has no significance for CSL or Zotero, so we can ignore parsing that part altogether.". Dropping particles are treated differently from initials or non-dropping particles, so you need to treat them as their own class of name element.
As for "The non-dropping particle ... may be joined with family name by punctuation", I would just like to add that the punctuation options are probably rather limited in (Western) names. I think spaces, apostrophes/quote marks, and hyphens cover most cases.
Then with regard to "Current suggestion is for some ... solution that cycles through possible permutations" and "User must know what he's looking for anyway, so I don't really see why that is easier than just correcting the case manually.". The main issue I have with the current setup is that users need to be explicitly aware of the existence of dropping and non-dropping particles, as well as the precise formatting requirements in Zotero to get correct output. This all without the Zotero UI giving any guidance or feedback when editing the name field.
Instead of a UI option that cycles through the different two-field name element storage options, like Frank's example:
van der Merwe, Wikus
Van der Merwe, Wikus
Van Der Merwe, Wikus
"van der Merwe", Wikus
der Merwe, Wikus van
Der Merwe, Wikus van
"der Merwe", Wikus van
Merwe, Wikus van der
van der Merwe, Wikus
I would like to propose two possible changes. First, I think the user should be confronted with all options simultaneously. I think cycling through this many options is just confusing. Even looking at the entire list right now it takes me a long time to figure out what's what. Second, I think the UI should focus on the desired output instead of the way the name needs to be stored in Zotero. E.g. the list could also be presented as a menu with the following options:
"(van der Merwe)" - "Merwe, W. van der"
"(Van der Merwe)" - "Van der Merwe, W."
"(Van Der Merwe)" - "Van Der Merwe, W."
"(van der Merwe)" - "van der Merwe, W."
"(der Merwe)" - "Merwe, W. van der"
"(Der Merwe)" - "Der Merwe, W. van"
"(der Merwe)" - "der Merwe, W. van"
"(Merwe)" - "Merwe, W. van der"
That seems much more intuitive to me. Zotero can then easily rearrange the name elements as required.
Finally, regarding "Correctly parsing/splitting names on import", I generally agree. In my own experience there are two annoying things. First, capitalization on import is often wrong. I deal with a fair number of Dutch names, and often particles come into Zotero uppercased. Second, sometimes non-dropping particles come in as dropping particles (that is, they reside in the given name field). It would be nice if Zotero had an easier way to move particles between the two name fields. Currently it's quite an ordeal: activate name field A, select particle, cut particle, activate name field B, select insertion point, paste particle.
To their arguments, I'd add that it would be nice if CSL could, *in the future*, format the dropping particle, at least to add parenthesis. To take a (now) well-known example: "La Fontaine, Jean (de)" and even "La Fontaine (de), Jean" is sometimes the desired output.
I didn't understand the purpose of the keyboard permutations of a name with particles at first, but I'm now convinced: it will make the mechanism of the particles parsing discoverable and clear to the user, especially with Rintze's proposals above which I second!
[Author] [Ann de, III]
is parsed as"family": "Author",
"given": "Ann de",
"suffix": "III"
[van Author] [Ann von]
is parsed as"family": "Author",
"given": "Ann von",
"non-dropping-particle": "van"
[de l’Author] [Ann]
is parsed as"family": "l’Author",
(though"given": "Ann",
"non-dropping-particle": "de"
de l’
is on the list),[vom und zum Author] [Ann]
is parsed as"family": "und zum Author",
(though"given": "Ann",
"non-dropping-particle": "vom"
vom und zum
is on the list).(1) is a bug.
(2) "von van" is not a listed particle, so "van" as NDP is expected.
(3) is a bug.
(4) is a bug.
I'll take a look, although the current parser will have a limited life expectancy. In the parser for the UI code I'm building with the same particles data, (1) fails, and (2)-(3) pass.
(Edit: Actually, I think I can just adapt the UI parsing code to do the classification. The new code is much more transparent, and since there may be a role for the classifier in translators, refactoring it will be a good use of time.)
Please click and view. I'm curious to know how it will be received.
Would a decoration hint with normal ordering work to distinguish the parts? Something like highlights, or mild boldface, or italics?
Instead, I would show just the family name by itself (`form="short"`), and separately, the full name (`form="long"` and `demote-non-dropping-particle="never"`) with `name-as-sort-order` active. Those two examples would very clearly show which name elements are recognized as particles, which particles are recognized as non-dropping and dropping, and how particles are capitalized.
@Gracile: I think we'll show the option disabled, but you're right that it shouldn't do anything if there is nothing to do.
Maybe "Particler" is a little too casual?
Would love to have this in Zotero.
Do you think the menu would be easier to read if the options are alphabetically sorted? I was on the fence (not demoting looks more natural to my Dutch eyes), but yeah, demoting the non-dropping particles provides more information since it shows the distinction between non-particles and non-dropping particles. Similar to the "Transform Text" option in the title menu, I would go for an action description, e.g. "Adjust Particles".
We may be able to slim down the number of options presented with unspec'd particles - we should only present things likely to be possible, the user can edit manually for rare combinations. We should also put a ceiling on the number of options or the number of particles somehow, to prevent mischievous people from DoS'ing the UI.
Al-Pitkin, Lemuel dos
So the headings describe what is happening with the surname—not sure if that's best, but that's what it's doing. It's actually reporting processor semantics, isn't it. I suppose this one should be dropping-particle.Headings (maybe with some further refinement) seem like progress, but what do you think?
How about forgoing headers, and only alphabetizing the list (is that really problematic?). That would change
(dos al-Pitkin) <> Pitkin, Lemuel dos al-
(dos Al-Pitkin) <> Al-Pitkin, Lemuel dos
(al-Pitkin) <> Pitkin, Lemuel dos al-
(Pitkin) <> Pitkin, Lemuel dos al-
(Dos al-Pitkin) <> Dos al-Pitkin, Lemuel
(Dos Al-Pitkin) <> Dos Al-Pitkin, Lemuel
(Al-Pitkin) <> Al-Pitkin, Lemuel dos
(dos al-Pitkin) <> dos al-Pitkin, Lemuel
(dos Al-Pitkin) <> dos Al-Pitkin, Lemuel
(al-Pitkin) <> al-Pitkin, Lemuel dos
to
(Al-Pitkin) <> Al-Pitkin, Lemuel dos
(al-Pitkin) <> al-Pitkin, Lemuel dos
(al-Pitkin) <> Pitkin, Lemuel dos al-
(dos Al-Pitkin) <> Al-Pitkin, Lemuel dos
(Dos al-Pitkin) <> Dos al-Pitkin, Lemuel
(Dos Al-Pitkin) <> Dos Al-Pitkin, Lemuel
(dos al-Pitkin) <> dos al-Pitkin, Lemuel
(dos Al-Pitkin) <> dos Al-Pitkin, Lemuel
(dos al-Pitkin) <> Pitkin, Lemuel dos al-
(Pitkin) <> Pitkin, Lemuel dos al-
The latter seems much more readable to me.
Alternatively, maybe you could offer dedicated particle capitalizing and particle type menus? That would reduce the options of the particle type menu to:
(al-Pitkin) <> al-Pitkin, Lemuel dos
(al-Pitkin) <> Pitkin, Lemuel dos al-
(dos al-Pitkin) <> dos al-Pitkin, Lemuel
(dos al-Pitkin) <> Pitkin, Lemuel dos al-
(Pitkin) <> Pitkin, Lemuel dos al-
(I see that the menu currently excludes the option that is identical to how the name is already formatted?)
The separate particle capitalizing menu would then also have very few options.
An undecorated list would be more readable in alphabetical order, but I think there may be value in the headers. They ease the user into the terminology that we use for the different forms, and help to tie the CSL documentation to what the user sees in the UI. When they are removed, the user is on their own to figure out what all those options mean.
(Editing for completeness)
The current form is not excluded from the list - it's just shown with the non-dropping particle demoted.
The list only gets big when the name contains unspecified particles (multiple terms in lower-case in a particle position). It seems like that would be uncommon (apart from typos). I'm not sure the added complexity in the UI would be worth it.
Edit: Just to clarify the "columns" represent: in-text citation ⬄ bibliography , right?
As for the list, we need to simplify it more. I think it's ok to be missing some obscure cases and have the users ask us how to enter those in, rather than make everyone confused with all the options. Since we're also concerned about improper capitalization, we can assume that any words in the name that match known particles should be lower-cased and proceed from there. Going with parsing a name entered as "Al-Pitkin, Lemuel Dos", I think we should offer the following transform options
Pitkin, Lemuel (dos al-) // Treat all particles as dropping
Pitkin, Lemuel (dos) al- // Shift each of the particles (left-to-right) into non-dropping mode
Pitkin, Lemuel dos al- //...
Al-Pitkin, Lemuel (dos) // transform each particle (left-to-right) into non-particle and repeat above
Al-Pitkin, Lemuel dos //...
Dos Al-Pitkin, Lemuel //...
Now, the obscure thing that remains is that (...) is a dropping particle. Maybe we can figure out a better way to display this.
Based on the previous discussions, capitalized particles are treated as non-particles and lower-case as non-dropping, so I don't think there's much sense in displaying the transforms that break this rule.
Thoughts?
Edit: maybe add an additional comma between given name/dropping particle and the non-dropping particle (and underline/bold/something it?). We can then drop the parentheses.
Pitkin, Lemuel dos al- // Treat all particles as dropping
Pitkin, Lemuel dos, _al-_ // Shift each of the particles (left-to-right) into non-dropping mode
Pitkin, Lemuel, _dos al-_ //...
Al-Pitkin, Lemuel dos // transform each particle (left-to-right) into non-particle and repeat above
Al-Pitkin, Lemuel, _dos_ //...
Dos Al-Pitkin, Lemuel //...