Automatically choose whether hyphen, en dash, or em dash should be used

  • You have the pages in a short form (135-38).
    For whatever reason that doesn't work correctly with styles that don't specify page ranges.
    Aha, sorry, I missed that.

    We're running into some back-and-forth adjustments to cope with irregular input, and it may take a few iterations to get the compromises right. This is being forced back to hyphen because the numbers don't form a range. For some sources, this would be a leaf number (odd, I know, but the meaning would be something like "page 38 of segment 135"). I gave priority to that case, and built in an override of -- for an explicit en-dash, and \- for an explicit hyphen, to cope with situations where it fails.

    Since users in the field have already run into problems with that logic, it should be adjusted. There are a couple of possible responses. First, we could just convert all hyphens to en-dash (retaining the possibility of override, so leaf numbers would need to be marked up explicitly, which is simpler to explain and not really a problem). Second, we could retain the current behavior, but treat en-dash as an explicit delimiter, never falling back to hyphen.

    Of the two choices, the first seems better. It's simpler to code, and simpler to explain, and more flexible (if, say, we run into a style that requires hyphens for ranges, the discriminate treatment of en-dash in the input won't get in the way or cause confusion).

    If that all sounds right, I can make the adjustment in a fresh release.
  • Certainly sounds good to me. You'd know how people using "leafs" are affected by that.
  • [sorry if I'm hijacking this thread]
    I've just discovered the replacement in z3.0 of hyphens by en-dashes for page range. This is certainly correct in English but not in French (fr-FR at least).

    In French typography, we use (non-breaking) hyphens for ranges/intervals, e.g.:
    year: 1945-1983
    pages: p. 65-82.
    It's U+2011 (non-breaking hyphen) [or U+2010 (hyphen)].

    En-dashes (–) are only used "when the bounds of the interval are compounds terms" (see the example with Napoléon Bonaparte below).

    Pour marquer un intervalle, la typographie française exige un trait d’union insécable : 1939-1945, Paris-Brest, etc. Pourtant, lorsque les bornes de l’intervalle sont des termes composés, le principe de lisibilité tend à imposer l’usage du tiret demi-cadratin, avec un espacement adéquat:
    Napoléon Bonaparte (Ajaccio, 15 août 1769 – île de Sainte-Hélène, 5 mai 1821).
    Le trait d’union - (sans espace avant ou après) pour noter un trait d’union (ex. : c’est-à-dire ; indication de pagination : p. 39-42) ;
    Trait d'union et tiret court : La tradition typographique française utilise le trait d'union (dont la fonction normale est de composer des mots) pour rendre l'idée de ... à dans les intervalles de pages, de notes, d'années, etc. La tradition anglo-américaine, pour sa part, utilise le tiret demi-cadratin (en dash, Alt+0150 sous Windows). Cette dernière tradition, plus précise et aussi plus belle, aurait tout avantage à s'implanter mais ne semble jamais avoir été un sujet de préoccupation dans le monde francophone.
  • that's too bad. We tried to find out about this when Frank implemented this feature and it looked universal, but we should have made more of an effort to track down French usage given your penchant for vive la difference ;-).
    I would guess the solution would be to disable this for all French locales, right?
  • Strange that I haven't seen the discussion when this was debated.

    Anyway, that should be disabled for fr-FR at least (and perhaps for fr-CA too: (canadian website))
  • Oh, fun. Just when I thought it was safe to go back to work on my book. :)

    Would it work to force the range delimiter into the locale behind the scenes? That would produce the expected behavior for the fr domains in the short term, and it will migrate smoothly if this is added to CSL locales in the future.
  • If I understand correctly, you'd want to hardcode this for locale="fr" in citeproc and then in the medium term we could add the page-range-delimiter to the locale? I think that's fine.
  • Yes, unfortunately, hyphens also for fr-CA. The source cited by Gracile is a solid one and it is confirmed by this the style guide at the Université de Montréal:

    And one of Gracile's other sources:
    is the Universi†é de Laval in Quebec City.

    Finally, it is also followed by the entire Université du Québec system:

    That's three for three.
  • I would then just assume all fr locales - my guess would be that the rest of the francophonie sticks closely to French rules.
  • for vive la difference ;-)
    Vive la différence, indeed. ;-) [but we're not those who drive on the left side, amongst other peculiarities!]
    Oh, fun. Just when I thought it was safe to go back to work on my book. :)
    Oh, sorry. Hope that's not hard to modify. No emergency.
    If I understand correctly, you'd want to hardcode this for locale="fr" in citeproc and then in the medium term we could add the page-range-delimiter to the locale? I think that's fine.
    "page-range-delimiter" is a good solution on the long-run (csl specs has to be updated). "range-delimiter" is used for dates AFAIR.
  • I've set the processor up to force a "page-range-delimiter" value into all locales. For locales beginning with "fr", the fixed value is set to hyphen, and for other locales it is en-dash. The new behavior will appear in the next release.

    This is a provisional fix; presumably this will eventually be configurable through a normal locale term, like quotes.
  • wonderful - thanks Frank.
  • Thank you Frank!
  • Unfortunately, I have to say that I just stumbled upon a US high impact journal
    ( that requires the short - for the page number range in its style. What should be done?

  • Add this just below the info section of your style (if there is no locale section there yet -- otherwise just add the term to the existing in-style locale):<locale>
    <term name="page-range-delimiter">-</term>
    This is not yet part of the official CSL specification, so it won't validate, but it will work.
  • We are planning to make the page range delimiter configurable on a per-locale and per-style basis. You'll have to wait for CSL 1.0.1 for that to happen, though.
  • Thanks for your reply! As I'm currently working on the same issue I tried your fix above, but it does not seem to work (at least for me).
    I pasted it right behind the info tag and before the macro tags. Is there anything else I should consider?

  • post the style in question (with the added page-range-delimiter definition) to or and give a link here so we can take a look.
  • Here:
  • the problem is that this is still a CSL 0.8.1 style - you'd have to convert it to CSL 1.0 first (and before adding the locale thing) before this works:
  • okay, thanks for that!

    but: It still doesn't work for me. I uploaded the updated code:

    Do you have another idea?
  • have you tried installing this (after setting a proper ID)? It might be that this isn't displayed in the test panel even though it works in Word etc..
  • yep, I just set the ID and reinstalled the style..Word still shows the wrong dash.
  • Ah, my bad, this wasn't completely implemented. I'll put up a processor revision very shortly (ver 1.0.319). When that works its way into the Zotero development source (probably within a day or two), this will start working in the 3.0 branch XPI. You can either wait for the next Zotero release, or watch for the appearance of the new processor version here and then install the dev XPI, which I believe is built within 24 hours of a branch checkin. If you do install the dev XPI, you should install the next full Zotero release (3.0.4) manually when it comes out, to avoid getting further development updates.
  • (currently dev xpis are not built automatically - Dan puts them up manually in semi-regular intervals, but usually (quite a bit) longer than 24hs)
  • Learn something every day. :) What adamsmith says.
  • Dan puts them up manually in semi-regular intervals, but usually (quite a bit) longer than 24hs
    I actually try to put out a new build whenever I see a 3.0 commit, which is often within a couple minutes if I'm around, but occasionally I'll miss one. Still planning to get us back on automatic dev builds at some point.
  • edited March 7, 2021
    Would it be feasible and helpful to convert also short dashes in dates that are part of titles to en dashes, whether across Zotero or in a specific style? By dates, I mean ranges of days, months, years, and centuries, some of which could be expressed in Roman numerals or with the ordinal number followed by a dot. I guess there is a risk of interfering with titles which have dashes between numbers that should stay as short dashes but I struggle to think of any common examples (perhaps sci-fi novels?). Obviously not in French titles though.
Sign In or Register to comment.