Parsing problem on Italian names
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
Like @nickbart, I disagree with "Given correct data entry, the dropping particle has no significance for CSL or Zotero, so we can ignore parsing that part altogether.". Dropping particles are treated differently from initials or non-dropping particles, so you need to treat them as their own class of name element.
As for "The non-dropping particle ... may be joined with family name by punctuation", I would just like to add that the punctuation options are probably rather limited in (Western) names. I think spaces, apostrophes/quote marks, and hyphens cover most cases.
Then with regard to "Current suggestion is for some ... solution that cycles through possible permutations" and "User must know what he's looking for anyway, so I don't really see why that is easier than just correcting the case manually.". The main issue I have with the current setup is that users need to be explicitly aware of the existence of dropping and non-dropping particles, as well as the precise formatting requirements in Zotero to get correct output. This all without the Zotero UI giving any guidance or feedback when editing the name field.
Instead of a UI option that cycles through the different two-field name element storage options, like Frank's example:
van der Merwe, Wikus
Van der Merwe, Wikus
Van Der Merwe, Wikus
"van der Merwe", Wikus
der Merwe, Wikus van
Der Merwe, Wikus van
"der Merwe", Wikus van
Merwe, Wikus van der
van der Merwe, Wikus
I would like to propose two possible changes. First, I think the user should be confronted with all options simultaneously. I think cycling through this many options is just confusing. Even looking at the entire list right now it takes me a long time to figure out what's what. Second, I think the UI should focus on the desired output instead of the way the name needs to be stored in Zotero. E.g. the list could also be presented as a menu with the following options:
"(van der Merwe)" - "Merwe, W. van der"
"(Van der Merwe)" - "Van der Merwe, W."
"(Van Der Merwe)" - "Van Der Merwe, W."
"(van der Merwe)" - "van der Merwe, W."
"(der Merwe)" - "Merwe, W. van der"
"(Der Merwe)" - "Der Merwe, W. van"
"(der Merwe)" - "der Merwe, W. van"
"(Merwe)" - "Merwe, W. van der"
That seems much more intuitive to me. Zotero can then easily rearrange the name elements as required.
Finally, regarding "Correctly parsing/splitting names on import", I generally agree. In my own experience there are two annoying things. First, capitalization on import is often wrong. I deal with a fair number of Dutch names, and often particles come into Zotero uppercased. Second, sometimes non-dropping particles come in as dropping particles (that is, they reside in the given name field). It would be nice if Zotero had an easier way to move particles between the two name fields. Currently it's quite an ordeal: activate name field A, select particle, cut particle, activate name field B, select insertion point, paste particle.
To their arguments, I'd add that it would be nice if CSL could, *in the future*, format the dropping particle, at least to add parenthesis. To take a (now) well-known example: "La Fontaine, Jean (de)" and even "La Fontaine (de), Jean" is sometimes the desired output.
I didn't understand the purpose of the keyboard permutations of a name with particles at first, but I'm now convinced: it will make the mechanism of the particles parsing discoverable and clear to the user, especially with Rintze's proposals above which I second!
[Author] [Ann de, III]
is parsed as"family": "Author",
"given": "Ann de",
"suffix": "III"
[van Author] [Ann von]
is parsed as"family": "Author",
"given": "Ann von",
"non-dropping-particle": "van"
[de l’Author] [Ann]
is parsed as"family": "l’Author",
(though"given": "Ann",
"non-dropping-particle": "de"
de l’
is on the list),[vom und zum Author] [Ann]
is parsed as"family": "und zum Author",
(though"given": "Ann",
"non-dropping-particle": "vom"
vom und zum
is on the list).(1) is a bug.
(2) "von van" is not a listed particle, so "van" as NDP is expected.
(3) is a bug.
(4) is a bug.
I'll take a look, although the current parser will have a limited life expectancy. In the parser for the UI code I'm building with the same particles data, (1) fails, and (2)-(3) pass.
(Edit: Actually, I think I can just adapt the UI parsing code to do the classification. The new code is much more transparent, and since there may be a role for the classifier in translators, refactoring it will be a good use of time.)
Please click and view. I'm curious to know how it will be received.
Would a decoration hint with normal ordering work to distinguish the parts? Something like highlights, or mild boldface, or italics?
Instead, I would show just the family name by itself (`form="short"`), and separately, the full name (`form="long"` and `demote-non-dropping-particle="never"`) with `name-as-sort-order` active. Those two examples would very clearly show which name elements are recognized as particles, which particles are recognized as non-dropping and dropping, and how particles are capitalized.
@Gracile: I think we'll show the option disabled, but you're right that it shouldn't do anything if there is nothing to do.
Maybe "Particler" is a little too casual?
Would love to have this in Zotero.
Do you think the menu would be easier to read if the options are alphabetically sorted? I was on the fence (not demoting looks more natural to my Dutch eyes), but yeah, demoting the non-dropping particles provides more information since it shows the distinction between non-particles and non-dropping particles. Similar to the "Transform Text" option in the title menu, I would go for an action description, e.g. "Adjust Particles".
We may be able to slim down the number of options presented with unspec'd particles - we should only present things likely to be possible, the user can edit manually for rare combinations. We should also put a ceiling on the number of options or the number of particles somehow, to prevent mischievous people from DoS'ing the UI.
Al-Pitkin, Lemuel dos
So the headings describe what is happening with the surname—not sure if that's best, but that's what it's doing. It's actually reporting processor semantics, isn't it. I suppose this one should be dropping-particle.Headings (maybe with some further refinement) seem like progress, but what do you think?
How about forgoing headers, and only alphabetizing the list (is that really problematic?). That would change
(dos al-Pitkin) <> Pitkin, Lemuel dos al-
(dos Al-Pitkin) <> Al-Pitkin, Lemuel dos
(al-Pitkin) <> Pitkin, Lemuel dos al-
(Pitkin) <> Pitkin, Lemuel dos al-
(Dos al-Pitkin) <> Dos al-Pitkin, Lemuel
(Dos Al-Pitkin) <> Dos Al-Pitkin, Lemuel
(Al-Pitkin) <> Al-Pitkin, Lemuel dos
(dos al-Pitkin) <> dos al-Pitkin, Lemuel
(dos Al-Pitkin) <> dos Al-Pitkin, Lemuel
(al-Pitkin) <> al-Pitkin, Lemuel dos
to
(Al-Pitkin) <> Al-Pitkin, Lemuel dos
(al-Pitkin) <> al-Pitkin, Lemuel dos
(al-Pitkin) <> Pitkin, Lemuel dos al-
(dos Al-Pitkin) <> Al-Pitkin, Lemuel dos
(Dos al-Pitkin) <> Dos al-Pitkin, Lemuel
(Dos Al-Pitkin) <> Dos Al-Pitkin, Lemuel
(dos al-Pitkin) <> dos al-Pitkin, Lemuel
(dos Al-Pitkin) <> dos Al-Pitkin, Lemuel
(dos al-Pitkin) <> Pitkin, Lemuel dos al-
(Pitkin) <> Pitkin, Lemuel dos al-
The latter seems much more readable to me.
Alternatively, maybe you could offer dedicated particle capitalizing and particle type menus? That would reduce the options of the particle type menu to:
(al-Pitkin) <> al-Pitkin, Lemuel dos
(al-Pitkin) <> Pitkin, Lemuel dos al-
(dos al-Pitkin) <> dos al-Pitkin, Lemuel
(dos al-Pitkin) <> Pitkin, Lemuel dos al-
(Pitkin) <> Pitkin, Lemuel dos al-
(I see that the menu currently excludes the option that is identical to how the name is already formatted?)
The separate particle capitalizing menu would then also have very few options.
An undecorated list would be more readable in alphabetical order, but I think there may be value in the headers. They ease the user into the terminology that we use for the different forms, and help to tie the CSL documentation to what the user sees in the UI. When they are removed, the user is on their own to figure out what all those options mean.
(Editing for completeness)
The current form is not excluded from the list - it's just shown with the non-dropping particle demoted.
The list only gets big when the name contains unspecified particles (multiple terms in lower-case in a particle position). It seems like that would be uncommon (apart from typos). I'm not sure the added complexity in the UI would be worth it.
Edit: Just to clarify the "columns" represent: in-text citation ⬄ bibliography , right?
As for the list, we need to simplify it more. I think it's ok to be missing some obscure cases and have the users ask us how to enter those in, rather than make everyone confused with all the options. Since we're also concerned about improper capitalization, we can assume that any words in the name that match known particles should be lower-cased and proceed from there. Going with parsing a name entered as "Al-Pitkin, Lemuel Dos", I think we should offer the following transform options
Pitkin, Lemuel (dos al-) // Treat all particles as dropping
Pitkin, Lemuel (dos) al- // Shift each of the particles (left-to-right) into non-dropping mode
Pitkin, Lemuel dos al- //...
Al-Pitkin, Lemuel (dos) // transform each particle (left-to-right) into non-particle and repeat above
Al-Pitkin, Lemuel dos //...
Dos Al-Pitkin, Lemuel //...
Now, the obscure thing that remains is that (...) is a dropping particle. Maybe we can figure out a better way to display this.
Based on the previous discussions, capitalized particles are treated as non-particles and lower-case as non-dropping, so I don't think there's much sense in displaying the transforms that break this rule.
Thoughts?
Edit: maybe add an additional comma between given name/dropping particle and the non-dropping particle (and underline/bold/something it?). We can then drop the parentheses.
Pitkin, Lemuel dos al- // Treat all particles as dropping
Pitkin, Lemuel dos, _al-_ // Shift each of the particles (left-to-right) into non-dropping mode
Pitkin, Lemuel, _dos al-_ //...
Al-Pitkin, Lemuel dos // transform each particle (left-to-right) into non-particle and repeat above
Al-Pitkin, Lemuel, _dos_ //...
Dos Al-Pitkin, Lemuel //...