Automatically choose whether hyphen, en dash, or em dash should be used
Since different citation styles deal differently with en and em dashes, I think that Zotero should at some point be handling the correct use of dashes.
Ideally users would input only hyphens and zotero would choose wether to use hyphens, en, or em dahses for the current citation style.
Right now, users need to know which citation style they are going to use when entering data, which somehow seems to defeat the purpose of working with a bibliographic citation system that can switch between citation styles.
This idea ame up in the following thread:
http://forums.zotero.org/discussion/5875/how-to-insert-en-dashes-for-dates-and-page-numbers/
Ideally users would input only hyphens and zotero would choose wether to use hyphens, en, or em dahses for the current citation style.
Right now, users need to know which citation style they are going to use when entering data, which somehow seems to defeat the purpose of working with a bibliographic citation system that can switch between citation styles.
This idea ame up in the following thread:
http://forums.zotero.org/discussion/5875/how-to-insert-en-dashes-for-dates-and-page-numbers/
I wonder how to address this on a CSL level. Maybe a generic "replace" function would solve this as well as many other issues. What about something like this:
<text variable="locator" replace="-" replace-with="–"/>
Maybe even with regex support (but I'm not sure about this):
<text variable="locator" replace="-+" replace-with="–"/>
(This would handle e.g. BibTeX imported data with "--" as page separator)
This simple syntax doesn't handle multiple replacements very well. But maybe one could solve this by using nested groups with one replacement on each group whenever this is needed.
A generic solution like this could many similar issues that we might not even be aware of today. And I think a more specialized solution to solve ranges would easily be much more complex.
Maybe you’re right and this should be handled at the application level. I think what was then necessary is some kind of »range« data type. This could be entered with two text boxes, like or using one text box (as it is now), and automatically split it using a defined set of separators, like -, --, ndash and mdash.
Then it should be locale dependent how these ranges are rendered in the output. In German, it is common to use ndash, maybe American guidelines prefer the mdash. If this is defined as a »term« in CSL, then citation styles could override the default locale behavior if a style requests a certain separator.
I am not sure I understand what you mean by CSL and ATM.
I can only speak for the Chicago Manual of Style which gives specific instructions on when and how to use the various dashes and hyphens.
http://www.chicagomanualofstyle.org/ch06/ch06_sec080.html
I assume that other styles have similar requirements.
Does this answer your question?
The CMS is almost 1000 pages; doesn't mean it's our responsibility to cover all of it.
Aside: your link requires a subscription to access. Assumptions aren't enough to justify significant changes to CSL at this point. I'd suggest if you really think this has significant variation depending on style, then do some research and present your findings here (e.g. a few links to prominent styles that illustrate your argument).
Otherwise, I'd suggest it ought to be the responsibility of a) the user, b) the translators, and/or c) a global configuration options in Zotero.
As I said, I need concrete evidence to support what you and biblio are expressing at this point as vague impressions ;-)
But you’re right, this wouldn’t be reason enough for changes to CSL. But maybe a setting for Zotero like "always replace - by – in page ranges" would make sense.
All the style guides I have seen require En-dashes between page ranges - the only one that I saw which used hyphens was for online only publication, (and I'm sure they'll get over it).
(Em-dashes are recommended for open ranges e.g. 1964— and therefore might be needed in titles, and are essential in the "subsequent-author-substitute"field.)
The search-and-replace workaround is a bit fiddly, doesn't update, and destroys any DOIs.
(edited)
In theory date ranges should have En-dashes even if they are words (e.g. March–May).
CMOS also says expressions which are not compound and do not modify one another should have En-dashes (e.g. Bose–Einstein condensate) (compared with add-on or award-winning).
If the CSL is going to support HTML tags for rich text markup, will it also support HTML characters like
&#8211 for an En-dash and &#8212 for an Em-dash ?
That would be one way around it for people who need the correct dashes(and typographers can be very picky about their dashes - there are up to 12 dashes, hyphens and minuses specified in pro typefaces!)
I think numbers should always have En-dashes between them — URLs are the only obvious exception (e.g. many DOIs) so these would have to be left alone.
If this is all too messy, the most important one for now is definitely in page-ranges. (and maybe em-dashes in "subsequent-author-substitute").
I think that would keep most people happy.
Providing special handling for dates and page numbers is easy because those are separate node types in CSL. The thing with en-dashes in loose word joins would have to be handled with a markup hint. The obvious thing would be to recognize and convert multiple hyphen strings (2 if by en, 3 if by m). The in-field HTML markup used in the new processor is just a set of arbitrary strings that happen to overlap a lot with HTML markup, to make it easy to interpret and work with. We're not bound to using HTML escapes (except possibly in HTML output, which is a whole different layer).
So ... looking pretty good. Should be able to solve this one pretty well.
CSL per se isn't likely to be supporting blanket substitution of dashes and such (in my python implementation, though, I'll be using en-dashes for page ranges and such).
Also, no need for HTML entities and such; we've got unicode.
Glad to hear this will be fixed. It's a pain changing them ATM.
It sounds like handling of the dash types in other contexts is a sticky issue, but for dates and pages uniform use of en dash seems sensible, no?
"(Note that hyphens rather than N- dashes are used between page numbers [contrary to the Chicago Manual15], as between biblical verses.)"
-CBQ Instructions for Contributors, section 24.
Zotero seems to default to using en dashes in citations, and even if I do a global search and replace in MS Word, every time I update the citations in the document, it would revert to en dashes. I have another forum post on this; what is the way to handle the problem?
Could you export an entry for which that happens to Zotero RDF, open with a text editor and copy and paste to a public gist at gist.github.com - provide a link here and maybe we can make sense of that.
Edit: The public clone command is: git clone git://gist.github.com/1839480.git gist-1839480
For whatever reason that doesn't work correctly with styles that don't specify page ranges. I suspect Frank was too restrictive in what he allowed as page ranges in the processor. With an explicit page-range-format that should still work, though.