citeproc-js: Referring to single vs. multiple paragraphs

dstillman · June 1, 2019

This discussion was created from comments split from: Changing how citeproc (?) decides between 'single' and 'multiple'.

willem.swinnen · May 1, 2019

I believe my questions connects to this topic, but do not hesitate to tell me I'd rather open a new discussion for it.

If I refer to one paragraph or a paragraph range, the outcome is, for instance, "para. 5" or paras. "5-7", which is perfect. But if I want to refer to multiple paragraphs without it being a range, I (believe I) can only use the ampersand, to make the processor clear the context is plural, for instance, "paras. 5 & 8". unfortunately I prefer to use the word "and" instead of the ampersand. But if I do, the processor thinks its singular, for example, "para. 5 and 8".

How can I achieve a plural result using the word "and"? Is there, for example, a way to make the processor clear that the use of the word and in the input ("5 and 8") must generate the plural term (paras.)? Or, another idea, is there a way the processor turns the ampersand into an "and" (input: "5 & 8", output: "5 and 8")?

Many thanks for your assistance.

Willem

fbennett · May 1, 2019

Thanks for this note. We'll wait for feedback from other quarters, but I think this can be done in the processor without too much trouble - possibly in a language-sensitive way.

willem.swinnen · May 1, 2019

Thank you, Frank, for the quick reply. I look forward to what others think about this issue.

adamsmith · May 1, 2019

@fbennett -- what's the current behavior? What sorts of combinations are counted as plural? I assume , ; & -- anything else?

What are you thinking for a solution? I guess we can add localized "and" terms, but then how about "5 to 7" "5 through 7" and others that I am probably not thinking of? On the other hand, just taking everything that includes 2+ numbers as plural seems intuitively problematic, but maybe better?

fbennett · May 1, 2019

Yes, just joins of ",", "&", and "-" currently. Was just thinking of recognizing "and" plus the localized term. Verbal ranges seem too far a stretch - have there been requests for that?

adamsmith · May 1, 2019

I would definitely add ; to the list. I'm OK with just using "and". We've not had any requests for verbal ranges, no, so maybe fine to not handle or refer to suffix?

willem.swinnen · May 2, 2019

That's great news, Frank, that you'll add "and" (and the localized term). I assume this will be for a future update of the citeproc. Any idea on when we might expect it?

In the meantime I will use "---" as a connector. It's a combination that is not used in my document so, after I'm finished (and unlinked), I can easily change all "---" with " and ". But I nevertheless look forward to the addition of "and", because I don't like to unlink (for some reason I always need to change some metadata after unlinking, which is a bugger).

@Adam, I've just tried "," and ";". It both triggers plural on the condition that a space follows (", " and "; "). Without the space the processor regards it as one paragraph, which I believe is correct behavior.

Thank you again for this new feature.

Willem

willem.swinnen · May 31, 2019

Hello Frank

I'm heading to finish and turn in the PhD. If, by any chance, you were just about to release this update (the word "and" as a connector), I'll wait that little longer before unlinking. If, however, this update is not planned in the coming week, please do not mind me. I'll be fine. Do not get me wrong, no impatience here. I'm just asking to make sure I'm not missing out on it without knowing, you see.

Willem

fbennett · May 31, 2019

Thanks for the nudge. This is an easy one to test and fix, I was just pulled away by other tasks. Will post when a Propachi update is out later today.

fbennett · June 1, 2019

Reading the thread more carefully, I see there are two ways to go about this:

Simple, but a hack: parse only &, but replace in output with the symbol form of "and" in the current locale if available, otherwise "&"; or
Flexible, but less portable: parse and , , and as in en, as well as localized (so , et and et ), and pass through the joiner unmodified.

Both approaches have their issues. The former abuses the "symbol" form of the term: if the style needs an actual localized ampersand elsewhere (seems unlikely, but all things can happen), the term may have the incorrect value. On the other hand, this approach would enforce style-driven consistency across the entire document, without editing individual citations.

The latter approach does not involve tampering with the locale settings of the style, and gives the user greater control over content. On the other hand, there is potential for typos (using Oxford comma or not, mixing of "and" and "et"), and if a document uses localized "and", then switches the bib format to another locale, pluralization of those entries would break, and those citations would all need to be edited.

I think those are the only two options, without additions to the CSL language. I'm happy to go either way (I have code on hand to handle both). Let me know your own preference, and we should wait for input from @adamsmith and @bwiernik before making the jump.

fbennett · June 1, 2019

Perhaps relevant to the choice posed above, the "symbol" form of "and" is currently defined in zero (0) standard CSL locales, and in zero (0) repository styles. So at present, at least, the risk of conflicting expectations over the value of this term in the first approach is nil.

bwiernik · June 1, 2019

I think there has been some discussion somewhere about localizing & for some locales, like Greek. So, the first option about parsing “&” and localizing seems good.

I think “and” and “, and” and localized versions should also be parsed and passed through literally. The other options here would be to (1) localize any form of “and” using the style locale, or (2) change to symbol based on the style.

I can see a case for (1), but also it might be surprising for users to see their words changing. I’m leaning toward just passing literally and not localizing.

For (2), I don’t see a way to do this consistently for all styles without an “and” element being provided as part of the citation-locator node. I think we can leave it to users to choose long or symbol forms of and.

fbennett · June 1, 2019

So if I understand correctly, the initial suggestion is to:
* Parse & as at present, but localizing if possible and falling back to &; and
* Recognize "and" in its en and non-en-locale forms, with or without comma, but to print the joining term and punctuation literally in that case.

adamsmith · June 1, 2019

Last summary by fbennett sounds right to me, yes.

fbennett · June 2, 2019

Now available for testing via the Propachi plugin.

willem.swinnen · June 5, 2019

I've tested and it comes out great. Thank you for that, Frank. Another thing I've noticed however is that "para." has a point, where "paras" has not. Is this how it should be or must it rather be "paras." also with a point? In the latter case, adding the point in the CSL style is not doing the trick. Thanks for your point of view on this.

fbennett · June 5, 2019

Which style are you using?

willem.swinnen · June 5, 2019

Hi Frank. I'm using the Chicago style with Ibid, to which I have modified certain elements.

adamsmith · June 5, 2019

That's en-GB specific. UK English doesn't use a period after plural abbreviations because "s" is, in fact, the last letter of the plural.

fbennett · June 5, 2019

Are you using the en-GB locale? That has "para." as the singular and "paras" as the plural set as default. I don't know if that's right, others may be able to advise.

willem.swinnen · June 5, 2019

Yes indeed, Frank. I use the en-GB locale by default. Adam sorts out all is the way it should be. It answers the question for me. I am not a native speaker, you see, and of course do not necessarily need a point after "paras". Thanks for the confirmation, Adam. And for the quick assist, both of you.