Automatically choose whether hyphen, en dash, or em dash should be used

Since different citation styles deal differently with en and em dashes, I think that Zotero should at some point be handling the correct use of dashes.

Ideally users would input only hyphens and zotero would choose wether to use hyphens, en, or em dahses for the current citation style.

Right now, users need to know which citation style they are going to use when entering data, which somehow seems to defeat the purpose of working with a bibliographic citation system that can switch between citation styles.

This idea ame up in the following thread:
http://forums.zotero.org/discussion/5875/how-to-insert-en-dashes-for-dates-and-page-numbers/
«1
  • Yes, I definitely second this request. E.g. biblatex has really smart handling of dashes, Zotero could benefit from having something similar.

    I wonder how to address this on a CSL level. Maybe a generic "replace" function would solve this as well as many other issues. What about something like this:

    <text variable="locator" replace="-" replace-with="–"/>

    Maybe even with regex support (but I'm not sure about this):

    <text variable="locator" replace="-+" replace-with="–"/>

    (This would handle e.g. BibTeX imported data with "--" as page separator)

    This simple syntax doesn't handle multiple replacements very well. But maybe one could solve this by using nested groups with one replacement on each group whenever this is needed.

    A generic solution like this could many similar issues that we might not even be aware of today. And I think a more specialized solution to solve ranges would easily be much more complex.
  • Thanks for supporting this request, felwert. How do we go from here? Is there an official way to suggest a feature?
  • Before I'd entertain adding this to CSL, I'd want people to do some research to prove it's necessary. I'm not convinced ATM that this is a style-specific feature; seems to me that Zotero and other implementation should globally handle this. Can you perhaps look into this, then, and post some links to what you find?
  • @bdarcus
    Maybe you’re right and this should be handled at the application level. I think what was then necessary is some kind of »range« data type. This could be entered with two text boxes, like
    pages: [__] to [__]
    or using one text box (as it is now), and automatically split it using a defined set of separators, like -, --, ndash and mdash.

    Then it should be locale dependent how these ranges are rendered in the output. In German, it is common to use ndash, maybe American guidelines prefer the mdash. If this is defined as a »term« in CSL, then citation styles could override the default locale behavior if a style requests a certain separator.
  • @bdarcus

    I am not sure I understand what you mean by CSL and ATM.

    I can only speak for the Chicago Manual of Style which gives specific instructions on when and how to use the various dashes and hyphens.

    http://www.chicagomanualofstyle.org/ch06/ch06_sec080.html

    I assume that other styles have similar requirements.

    Does this answer your question?
  • edited April 9, 2009
    ATM and CSL.

    The CMS is almost 1000 pages; doesn't mean it's our responsibility to cover all of it.

    Aside: your link requires a subscription to access.
    I assume that other styles have similar requirements.
    Assumptions aren't enough to justify significant changes to CSL at this point. I'd suggest if you really think this has significant variation depending on style, then do some research and present your findings here (e.g. a few links to prominent styles that illustrate your argument).

    Otherwise, I'd suggest it ought to be the responsibility of a) the user, b) the translators, and/or c) a global configuration options in Zotero.
  • edited April 9, 2009
    @felwert:
    I think what was then necessary is some kind of »range« data type.
    What happens if you have page numbers such as "1, 2, 14-20"? That's not uncommon with periodicals.
    Then it should be locale dependent how these ranges are rendered in the output. In German, it is common to use ndash, maybe American guidelines prefer the mdash. If this is defined as a »term« in CSL, then citation styles could override the default locale behavior if a style requests a certain separator.
    In the U.S. we use en-dashes as well for ranges, and I cannot imagine a case where a style would consider that wrong.

    As I said, I need concrete evidence to support what you and biblio are expressing at this point as vague impressions ;-)
  • edited April 9, 2009
    Then it should be locale dependent how these ranges are rendered in the output. In German, it is common to use ndash, maybe American guidelines prefer the mdash. If this is defined as a »term« in CSL, then citation styles could override the default locale behavior if a style requests a certain separator.
    I don't think this is locale-specific & I've never seen an em dash used for numeric ranges (only en dashes and hyphens (but the latter might be due to laziness)).
    Does this answer your question?
    Not really. A single style in a single locale is one thing. But it doesn't answer whether or how CSL needs to ask that en dashes be used. In LaTeX, an en dash is nearly always used in ranges. Perhaps the dashes don't need to be in CSL & Zotero could always use an en dash. If you have styles that called for explicitly using a separator that wasn't an en dash, it'd be helpful. If you have styles from other locales, than that could be more helpful still.
  • Ok, one use case is very simply, not really locale dependent, but yet quite common: Many web sites simply use - as a range separator, not endash. But in the output, I’d normally like to have endash. (Plus, many users don’t know how to type an endash nor even think about it.)

    But you’re right, this wouldn’t be reason enough for changes to CSL. But maybe a setting for Zotero like "always replace - by – in page ranges" would make sense.
  • Will dashes be handled in the new CSL?

    All the style guides I have seen require En-dashes between page ranges - the only one that I saw which used hyphens was for online only publication, (and I'm sure they'll get over it).

    (Em-dashes are recommended for open ranges e.g. 1964— and therefore might be needed in titles, and are essential in the "subsequent-author-substitute"field.)

    The search-and-replace workaround is a bit fiddly, doesn't update, and destroys any DOIs.
  • edited September 9, 2009
    If there is a consistent pattern, this would be easy to cover. Would putting an en-dash between two numbers, an em-dash after a number followed by a non-number, an em-dash for two or more hyphens, and hyphens everywhere else do the trick?

    (edited)
  • Well, there are lots of exceptions, though the above scheme would cover most events.

    In theory date ranges should have En-dashes even if they are words (e.g. March&#8211May).
    CMOS also says expressions which are not compound and do not modify one another should have En-dashes (e.g. Bose&#8211Einstein condensate) (compared with add-on or award-winning).

    If the CSL is going to support HTML tags for rich text markup, will it also support HTML characters like &amp;#8211 for an En-dash and &amp;#8212 for an Em-dash ? That would be one way around it for people who need the correct dashes
    (and typographers can be very picky about their dashes - there are up to 12 dashes, hyphens and minuses specified in pro typefaces!)

    I think numbers should always have En-dashes between them &#8212 URLs are the only obvious exception (e.g. many DOIs) so these would have to be left alone.

    If this is all too messy, the most important one for now is definitely in page-ranges. (and maybe em-dashes in "subsequent-author-substitute").
    I think that would keep most people happy.
  • Looks like we can set things up so that most people are happy most of the time, and the handling mechanisms are kept out of the CSL style file and, at least for the present, the locales.

    Providing special handling for dates and page numbers is easy because those are separate node types in CSL. The thing with en-dashes in loose word joins would have to be handled with a markup hint. The obvious thing would be to recognize and convert multiple hyphen strings (2 if by en, 3 if by m). The in-field HTML markup used in the new processor is just a set of arbitrary strings that happen to overlap a lot with HTML markup, to make it easy to interpret and work with. We're not bound to using HTML escapes (except possibly in HTML output, which is a whole different layer).

    So ... looking pretty good. Should be able to solve this one pretty well.
  • edited September 9, 2009
    As I mentioned earlier in this thread, at a certain point, some of these details are up the user to deal with in their data; not in CSL processors.

    CSL per se isn't likely to be supporting blanket substitution of dashes and such (in my python implementation, though, I'll be using en-dashes for page ranges and such).

    Also, no need for HTML entities and such; we've got unicode.
  • Well, if you are free to chose, then yes multi-dash conversion (2=En 3=Em) would definitely be the nicest solution, it is common shorthand in other software too.
    Glad to hear this will be fixed. It's a pain changing them ATM.
  • Providing special handling for dates and page numbers is easy because those are separate node types in CSL.
    Hi guys, is there any news on this? I have a mix of en dashes and hyphens in my page number fields, depending on where I've pulled the citation data from; it'd be great for these to be consistently displayed as en dashes when I generate a reference list.

    It sounds like handling of the dash types in other contexts is a sticky issue, but for dates and pages uniform use of en dash seems sensible, no?
  • I'd agree with en-dashes in page number ranges, too, as a default. At the moment, I do what I did with Endnote - clean it up at the end using Word's find and replace feature. Which is fine, but my life would be a smidgen easier this way!
  • To bdarcus: I can give you concrete proof from my own work. I am wriing a CBQ citation style, and I will quote from the CBQ Instructions for Contributors:

    "(Note that hyphens rather than N- dashes are used between page numbers [contrary to the Chicago Manual15], as between biblical verses.)"
    -CBQ Instructions for Contributors, section 24.

    Zotero seems to default to using en dashes in citations, and even if I do a global search and replace in MS Word, every time I update the citations in the document, it would revert to en dashes. I have another forum post on this; what is the way to handle the problem?
  • Has anyone had a problem the other way, that en-dashes (unicode) are transformed by Zotero into hyphens? They look right in the citation dialog but then the formatted citation has a hyphen.
  • This should be fixed in Zotero 3.0.2 - i.e. all page ranges should now have en-dashes. I tried this and it works for me with ASA, APA, and Chicago
  • That is interesting. Did anything need to be added to those style definitions for it to work?
  • no - that was fixed in the CSL processor between 3.0.1 and 3.0.2 - if you have an earlier version of the processor you can force it by adding a page-range-format to cs:style
  • Hmmm, I am running 3.0.2 and it transforms my en-dashes into hyphens.
  • that's not good. Is that a new problem for you?
    Could you export an entry for which that happens to Zotero RDF, open with a text editor and copy and paste to a public gist at gist.github.com - provide a link here and maybe we can make sense of that.
  • edited February 15, 2012
    It is a new problem, and refreshing using 3.0 fixes the problem, I think.

    Edit: The public clone command is: git clone git://gist.github.com/1839480.git gist-1839480
  • the regular gist URL is fine - that's not the full RDF, though - I need the entire export file.
  • actually - I can see what the problem is: You have the pages in a short form (135-38).
    For whatever reason that doesn't work correctly with styles that don't specify page ranges. I suspect Frank was too restrictive in what he allowed as page ranges in the processor. With an explicit page-range-format that should still work, though.
  • @dowens: The data you posted won't import into Zotero. Export as "Zotero RDF".
  • Sorry, my brain wasn't working right after a long day of debugging a file and such. Try it again.
Sign In or Register to comment.