[MLZ] text-case="sentence" not working

wonblee · April 17, 2014

text-case="sentence" ignores the first uppercase letter of words and lowercase everything. As a result, it now lowercase proper nouns.

adamsmith · April 17, 2014

that has always been the case with citeproc-js - it's why we're not using it at all in any repo styles. There's really no point in having a function that doesn't do anything to mixed case strings (which should be pretty much everything you have in Zotero/MLZ).

wonblee · April 17, 2014

I don't think we are on the same page.
I have an article title "The New York state law", which is converted to "The new york state law". Are you saying this is the behavior oof citeproc-js all along?

adamsmith · April 17, 2014

That's exactly what I'm saying, yes. If you want sentence case, just don't set text-case at all.

fbennett · April 17, 2014

As adamsmith says, that's expected behaviour. It's not specific to MLZ, but a feature of text-case="sentence" as implemented in CSL processors generally. The problem is that there is no way to determine which capitalized words in a title are proper nouns, without cumbersome markup of the field content.

The solution is to set up the field content in sentence-case, and use the text-case="title" setting to convert to title-case if required by the target style.

adamsmith · April 17, 2014

This is off in the specifications:
http://citationstyles.org/downloads/specification.html#sentence-case-conversion
that's what I assume is the source of the confusion here. This has come up before, I'm not sure if there's a reason for the specs as they are - IIRC Rintze said that part goes back to .8 specs.

wonblee · April 17, 2014

"The problem is that there is no way to determine which capitalized words in a title are proper nouns, without cumbersome markup of the field content."

Yes, that makes perfect sense. Yes, I got confused by the specs.

fbennett · April 18, 2014

The language could be clarified by changing item (1) to read something like:

For fields that are entirely in uppercase (i.e. that contain no lowercase letters), the first character in the field remains capitalized, and all other letters are lowercased.

wonblee · April 18, 2014

The solution is to set up the field content in sentence-case, and use the text-case="title" setting to convert to title-case if required by the target style.

Yes, I'm realizing that. Vancouver (maybe this particular variant of Vancouver) requires sentence-case, and the only way to produce sentence-case with capitalized proper nouns is to enter title field contents in Vancouver.

adamsmith · April 18, 2014

Frank -
my concern is actually not (1) but (2):

For lower or mixed case strings, the first character of the first word is capitalized if the word is lowercase. The case of all other words stays the same.

That's not what we're doing is it?

fbennett · April 18, 2014

Um ... no. Would that be sensible behavior? If so, and if we know what "the word is lowercase" means, citeproc-us could be taught to do it.

adamsmith · April 18, 2014

I don't think we should bother - the added value over just lettings titles alone is minimal. I'd just fix the specs.
This was last discussed on February 3rd on xbiblio when Sylvester brought it up. Rintze pointed out that the specs were changed in response to this thread:
https://forums.zotero.org/discussion/23504/
in retrospect I think that was a mistake. The old version was better.

fbennett · April 18, 2014

The technical argument for reversing the change linked from the thread is that as currently written, the behaviour specified for text-case="capitalize-first" and text-case="sentence" seems to be identical.

(Sorry for not spotting that when the linked thread was active.)

(Edit: corrected "capitalize-all" to read "capitalize-first".)

wonblee · April 18, 2014

Um ... no. Would that be sensible behavior? If so, and if we know what "the word is lowercase" means, citeproc-us could be taught to do it.

I don't think we should bother - the added value over just lettings titles alone is minimal. I'd just fix the specs.

Here is my take: There would be some value in teaching citeproc-us to do proper sentence-case. I think people usually grab title fields using translators or extract them from PDFs, and the titles so obtained are rarely in sentence-case. After converting a large number of title fields into sentence-case manually for a manuscript formatted in Vancouver, I really wish there were some kind of automation for this.

I understand that building a sentence-case engine for citeproc-us requires recognizing "skip words" or proper nouns. I think there is no other way but to provide a basic template and let the users build them up. Just like journal abbreviation file. That in and of itself is a laborious process, but I think it's better and the pay-off will be higher in the long run than the alternative : sentence-casing title fields manually.

adamsmith · April 18, 2014

I tend to disagree for several reasons:
1. Title data is getting better across the board. For Vancouver specifically, titles from pubmed come in sentence case, more generally, all proper library catalogs are in sentence case, even google scholar by know mostly is. I know many publishers still have title cased data, I'm just saying it's getting better.

2. There is a right-click --> convert to (pseudo) sentence case option available already in Zotero, which presumably will work in batch in the future http://chronicle.com/blogs/profhacker/zotero-quick-tip-transform-title-text/45575 so it's not like you have to do this for every letter.

3. Perhaps most importantly, since, as you say, this requires significant user input, making it work reliably across implementations is a nightmare. We already have, for example, a library catalog that's using CSL (and I believe citeproc-js via node.js or) to generate citations from items. Changing how sentence case works would break that.

4. Even with skip words, sentence casing wouldn't be reliable. There are many words that can exists as both proper nouns and regular words:
"Have the makers of Makers Mark missed the mark?"

aurimas · April 18, 2014

Plus, how often do skip-words reappear in titles? (I suppose if you're working on a subject that involves a proper noun)