Requirement for string scan in CSL

fbennett · December 15, 2008

In Bluebook citation, I have come up against two formatting cases that cannot be handled without looking inside the string content of a field. Both of these involve page numbers. As I dig through the style requirements I may come across other instances, but these two are already definite. Rule 3.3(a) of the 17th edition of the Bluebook provides as follows ...

(a) Pages. Give the page number or numbers before the data parenthetical, without any introductory abbreviation ("p." and "pp." are used only in internal cross-references [cross reference omitted]:

Arthur E. Sutherland, Constitutionalism in America 45 (1965).
[...]

Use "at" if the page number may be confused with another part of the citation; use a comma to set off "at" from preceding numerals:

Biographical Directory of the Governors of the United States 1978-1983, at 257 (Robert Sobel & John W. Raimo eds., 1983).

Thomas I. Emerson, Forword to Catharine A. MacKinnon, Sexual Harassment of Working Women, at vii, ix (1979).

Satisfying this rule without the manual retouching of citations will require some means of scanning the content of the relevant fields. In this particular instance, the ability to identify whether the first or the last character in a field matches any of 0-9 would be sufficient. I don't know what the prospects are of getting this into CSL and the formatting engine, but it's the only way I can see of solving this problem.

bdarcus · December 15, 2008

Fred, correct me if I'm wrong, but there's nothing in the Use "at" if the page number may be confused with another part of the citation rule that says you can't always use it. E.g. is there really any obvious case in which it would be considered an error to use it?

A lot of styles have a lot of stupid rules (as in, difficult to formalize in the language of computers), and so we have to exercise some judgment about the right balance of simplicity, consistency, and breadth.

fbennett · December 15, 2008

Unfortunately, the use of a bare number to indicate pages is already a well established convention in the target community. The change you suggest would be a major one, and my intuition is that it would not be accepted. Strict adherence to Bluebook is a bit of a religion among law review editors, who are not long-term professionals, but law students who come to the job cold and under the pressure of other work (as was so of Barack Obama at Harvard). The conventions specified in the Bluebook, for all their faults, saves time and hassle, and local editors won't break from them lightly.

Gaining general acceptance for a change of this scale would require agreement from the consortium that maintains the style (The Columbia Law Review Association, The Harvard Law Review Association, the University of Pennsylvania Law Review, and The Yale Law Journal) ... I would be doubtful about the prospect of success on that front.

If you are definite that this cannot be accommodated, I'll add it to the list of errata in the notes on the zotero style for the present. Should I take it that that is where things stand?

(Just one minor correction: my name is Frank. ;)

bdarcus · December 15, 2008

Ah crap; sorry about that ... Frank! Juggling too many things today to keep names straight I guess!

I'm not making any categorical statements (certainly not about Zotero) but just observing that adding regular expression support or otherwise scanning content is adding a significant amount of complexity with unclear payoff.

I'm also noting that the difficulty is based on the fact that these rules are about the convenience of human authors, who can make contextual judgments in ways that are just hard to do here (and unnecessary with computers). But academia is, as a friend once said to me, more conservative than the Vatican ;-)

But ... it sounds like the logic here might be understood as if there's a numeric variable after the page number, add the "at." In that case, it might be possible to do this currently using the "is-numeric" condition. See if you can get that to work and let us know.

fbennett · December 15, 2008

The Vatican analogy is spot on (the Ecclesiastical Courts predate the common law, after all!). And no worries.

I didn't know about the is-numeric condition. Growing pains. I'll have a go. No problem about trade-offs; I have quite a knack for writing awful code, and it's actually comforting when someone has the good sense to apply the brakes.

fbennett · December 15, 2008

Getting there, but there's still a hitch. The locator field is always identified as non-numeric. Here's the test code:

<choose>
  <if locator="page">
    <choose>
      <if is-numeric="locator">
        <text value=", XXat"/>
      </if>
    </choose>
  </if>
</choose>
<text variable="locator" prefix=" "/>

This is just a simple case for testing (the logic is backwards), but it always produces a bare number (" 23" or " xvii"). As a wild guess, the locator field seems to contain a couple of entities, and is-numeric seems to be checking against the label rather than the content. If that's right, and if I have not overlooked a means of discriminating the two, it's a bug.

fbennett · December 15, 2008

Hmm. I was assuming that is-numeric was working with other fields, but I just checked with the note field, and it always reports false on that one as well. This is with Zotero 1.5 Sync Preview, maybe is-numeric is broken in that version?

fbennett · December 15, 2008

Further news on this. The is-numeric test works correctly when applied to the editor field. From looking at the CSL spec and the Zotero code, it looks like CSL permits it against all fields, but Zotero only applies to a subset of "numeric" fields. To anyone on Zotero side, Is it possible to get this fixed, or at least get a ticket opened?

fbennett · December 16, 2008

Okay, here's a patch for the chrome directory that addresses this issue. Not sure what side effects adding these two fields to _zoteroNumberFieldMap might produce, but it's been tested to make sure it enables conditionality. If this is an acceptable mod, it would be very nice to have it merged.

diff -r -u chrome.orig/content/zotero/xpcom/csl.js chrome/content/zotero/xpcom/csl.js
--- chrome.orig/content/zotero/xpcom/csl.js	2008-12-16 18:11:52.000000000 +0900
+++ chrome/content/zotero/xpcom/csl.js	2008-12-16 17:55:05.000000000 +0900
@@ -1882,7 +1882,9 @@
 	"issue":"issue",
 	"number-of-volumes":"numberOfVolumes",
 	"edition":"edition",
-	"number":"number"
+	"number":"number",
+	"note":"extra",
+	"locator":"locator"
 }
 /*
  * Gets a numeric object for a specific type. <number variable="edition" form="roman"/>

Your humble servant,
FB

fbennett · December 16, 2008

Final note on this. There were problems in the code above, but they have now been solved. Can offer a patch to anyone interested.

dstillman · December 16, 2008

Ticket created for review. You can upload the latest patch there.

fbennett · December 17, 2008

Done.