I've been mulling over what the tests should look like, and I'd like to confirm. This is just for the prefix and suffix fields that the user enters on a citation, right? As far as I can see, there's no need to do anything for that that would affect CSL style code, since the characters can be coded into affix and delimiter attributes relevant elements.
For field content, I assume that it would make more sense to code the correct Unicode values into the field (perhaps with a "Localize" context menu item, as a companion to the other transforms).
For user-supplied affix fields, it would be preferable to rely on the client (eventually tinyMCE or something like that) to handle this kind of thing, but if there is need, I suppose the processor could handle it as well.
Ok for affixes but I was talking about field content.
I assume that it would make more sense to code the correct Unicode values into the field
Honestly that's not a good solution, there is no way to be absolutely sure that spaces are non-breaking.
If you don't want to add it to the proc (you did it for apostrophes), I'd prefer an option à la LibreOffice, i.e. a checkbox which would automatically change "space?" by "nbsp?".
Hum... No, I think that's a csl issue: styles are localized and that's part of the localization. Indeed, the downside of an option in the client (checkbox à la LibreOffice) is that the user would have to change all fields when writing in English. That sounds crazy.
@Gracile: Sorry for the delay in responding. To kick-start this again, let's start with some tests, so I don't get confused about the requirements. We have two that seem relevant:
The FrenchApostrophe test is placing non-breaking space between the guillemet and the text they enclose, which seems to be correct. In the FrenchOrthography test, they are ordinary spaces. If these are converted to non-breaking spaces (for guillemet and for the other punctuation marks flagged in the other thread), this will address the fault?
Hmm. An update on this item. I've been looking at the code and tests, and we're actually handling everything mentioned in that thread (open and close guillemet, terminal punctuation with a preceding space) by replacing a plain space with a thin no-break space (Unicode U+202F).
I was fooled at first because the character is visibly indistinguishable from a plain space in my editor (Emacs). But it's definitely returning thin no-break spaces in the FrenchOrthography test. If that's not correct behavior, or if it's not behaving as a no-break space when it hits one of the rendering engines (Word, LibreOffice, various browsers) let me know; but as far as I can tell the processor is already trying to do the right thing here.
Frank: a user has reported a typographic issue with citation containing multiple cites (Smith, 2012; Doe, 1995). In French typography, as I explained above, the ";" should be preceded by a (thin) no-break space. Would it be possible to add this change when the locale is fr-FR. As Rintze said, I think we might add these exceptions somewhere in the CSL specs (how?).
(and by the way, am I right that the behaviour you're describing above (automatic replacement of space by thin no-break space) is applied whatever the Zotero language field or the locale defined in a style?)
while that's technically true, narrow non-breaking spaces (U+202F) are a bit of a problem because IIRC they're not widely supported for font types - they're only used in Mongolian and French ;-) (not making that up...)
Yes, there aren't supported by all fonts. Anyway, I didn't realized that it is controlled by the delimiter. <layout suffix="." delimiter="; "> So changing it to: <layout suffix="." delimiter="&#8239;; "> should work.
It might be useful to localize that but probably tricky.
Setting the required space on the delimiter is the way to go, at least for now. The inter-cite join is already pretty slippery, and I'd rather not try to code further smarts into it unless that proves to be necessary.
For field content, I assume that it would make more sense to code the correct Unicode values into the field (perhaps with a "Localize" context menu item, as a companion to the other transforms).
For user-supplied affix fields, it would be preferable to rely on the client (eventually tinyMCE or something like that) to handle this kind of thing, but if there is need, I suppose the processor could handle it as well.
If you don't want to add it to the proc (you did it for apostrophes), I'd prefer an option à la LibreOffice, i.e. a checkbox which would automatically change "space?" by "nbsp?".
Hum... No, I think that's a csl issue: styles are localized and that's part of the localization. Indeed, the downside of an option in the client (checkbox à la LibreOffice) is that the user would have to change all fields when writing in English. That sounds crazy.
FrenchApostrophe
FrenchOrthography
The FrenchApostrophe test is placing non-breaking space between the guillemet and the text they enclose, which seems to be correct. In the FrenchOrthography test, they are ordinary spaces. If these are converted to non-breaking spaces (for guillemet and for the other punctuation marks flagged in the other thread), this will address the fault?
I was fooled at first because the character is visibly indistinguishable from a plain space in my editor (Emacs). But it's definitely returning thin no-break spaces in the FrenchOrthography test. If that's not correct behavior, or if it's not behaving as a no-break space when it hits one of the rendering engines (Word, LibreOffice, various browsers) let me know; but as far as I can tell the processor is already trying to do the right thing here.
As Rintze said, I think we might add these exceptions somewhere in the CSL specs (how?).
(and by the way, am I right that the behaviour you're describing above (automatic replacement of space by thin no-break space) is applied whatever the Zotero language field or the locale defined in a style?)
Anyway, I didn't realized that it is controlled by the delimiter.
<layout suffix="." delimiter="; ">
So changing it to:
<layout suffix="." delimiter="&#8239;; ">
should work.It might be useful to localize that but probably tricky.