[citeproc bug] punctuation in quotes

If punctuation is set as a prefix in a macro, it doesn't get pushed into the preceding quotation marks.
Example using chicago-author-date
Is:
Chwieroth, Jeffrey M., H. Street, A. M. Hicks, and D. Pinheiro. 2007. “The Institutional Construction of Neoliberal Economic Globalization: The Case of Capital Market Liberalization in Latin America”. Paper presented at the 3rd annual meeting of the International Political Economy Society, Philadelphia, PA.

Should be:
Chwieroth, Jeffrey M., H. Street, A. M. Hicks, and D. Pinheiro. 2007. “The Institutional Construction of Neoliberal Economic Globalization: The Case of Capital Market Liberalization in Latin America.” Paper presented at the 3rd annual meeting of the International Political Economy Society, Philadelphia, PA.

(note the placement of the period at the end of the title.
citation data as RDF:
https://gist.github.com/adam3smith/8f6557c7eb55f034223c


@fbennett - I do realize that this can be prevented in styles, but it would be a huge relief to get this fixed, otherwise we need separate macros every time this comes up

edit: this is obviously on a computer with an en-US default-locale
  • We've been over this before, I think. Placement on the prefix is only part of the cause. The following code works as expected:
    <layout>
    <text macro="the-title" quotes="true"/>
    <text macro="the-publisher" prefix=". "/>
    </layout>

    As output, this yields:
    "My Title." My Publisher
    The problem arises when the affix is separated from the quotes by one or more levels of nesting:
    <layout>
    <text macro="the-title" quotes="true"/>
    <group>
    <text macro="the-publisher" prefix=". "/>
    </group>
    </layout>

    This yields:
    "My Title". My Publisher

    I'll take another look, but in the worst case this will need reimplementation of the output method in the processor. If that proves necessary (if), I'm not sure when it can happen.
  • yeah, I'm sure we've covered this before.
    The problem arises when the affix is separated from the quotes by one or more levels of nesting:
    and a macro counts as nesting in this respect, right? Because that's the problem here, there's not actually any real nesting, it's just a prefix set in a macro instead of on the macro itself.

    Could you let me know if this turns out to be very difficult and long-term? I'll try tinkering with the style, then.
  • A macro opens a new nesting level, because the cs:text calling the macro will accept an affix attribute of its own. It has the same structure internally as a cs:group.

    The code that handles punctuation adjustments and quote swapping is decidedly awful. I tried several approaches to covering this case this morning, and only succeeded in breaking things horribly.

    So yes, this particular headache is going to be with us for the foreseeable future.
  • ok, thanks
  • Don't look now, but I may have a working solution to this on the way. It's not done yet, but I seem to be better able to feel my way around recursion than I once was.
  • Just a holding note to say that the prospects of the solution working out are still pretty good. The basic structure of the new code I'm trying is (much) simpler and more transparent; it's just taking a few days to work through the many permutations reflected in the test suite.
  • edited May 30, 2014
    Some of the remaining errors in the test suite may not be actual errors. How about this one?
    Davis, Jennifer J. “Men of Taste: Gender and Authority in the French Culinary Trades, 1730-1830.” Ph.D. diss., Pennsylvania State University, History, 2004.

    The full stop after "1830" is now falling inside the quotes, with a Chicago style in the en-US locale. It used to fall outside:
    Davis, Jennifer J. “Men of Taste: Gender and Authority in the French Culinary Trades, 1730-1830”. Ph.D. diss., Pennsylvania State University, History, 2004.

    Not sure if titles ending in a number are meant to be treated differently. Is the new behaviour correct?
  • edited May 30, 2014
    Here's another edge case. I think we've discussed it before, but I have better control over the code now, and we may be able to reach a more satisfactory result:
    P.G. Zimbardo, “Does Psychology make a significant difference in our lives?” American Psychologist, vol. 59, Jan. 2004, pp. 339–351.

    The comma after the question mark is suppressed by the new code. It used to render:
    P.G. Zimbardo, “Does Psychology make a significant difference in our lives?,” American Psychologist, vol. 59, Jan. 2004, pp. 339–351.

    We could discriminate in the suppression of the comma between styles that place it inside the quotes, and those that place it outside. Or we could preserve the behaviour of always including it when it appears against a question mark.
  • edited May 30, 2014
    Here's another item. The CSL code for it is the following:
    <text variable="title" quotes="true" suffix="."/>

    The test input is the following:
    This is 'The One'
    This this CSL and input, and with punctuation-in-quote set to true, the new code is producing the following output:
    “This is ‘The One’.”

    In the current processor, the punctuation migrates into the inner quotes:
    “This is ‘The One.’”

    It's an edge case, but which behaviour is more correct?
  • Davis, Jennifer J. “Men of Taste: Gender and Authority in the French Culinary Trades, 1730-1830.” Ph.D. diss., Pennsylvania State University, History, 2004.
    this is correct, so that's good.
    unfortunately, for the next example, the old behavior is correct:
    P.G. Zimbardo, “Does Psychology make a significant difference in our lives?,” American Psychologist, vol. 59, Jan. 2004, pp. 339–351.
    we had that discussion recently, I believe that's a change in Chicago's 16th edition, see CMoS 14.105


    Same for the 3rd example:
    “This is ‘The One.’”
    is correct without doubt, see CMoS 6.11
  • Okay, I'll make those happen. More to come, tomorrow.
  • (it would be good to annotate these unit tests and have them mention that the results have been cross-checked with CMoS)
  • (Easy to do that later, once processor code for fine-grained control over output is in place.)
  • edited May 31, 2014
    Here's another to confirm. It's another test using a Chicago in the en-US locale, where the commas ended up outside of quotation marks for some reason:
    [1] John Doe, “His Anonymous Life”, 1965; Jane Roe, “Her Anonymous Life”, 1965.
    [2] Doe, “His Anonymous Life”; Roe, “Her Anonymous Life.”

    They now fall inside:
    [1] John Doe, “His Anonymous Life,” 1965; Jane Roe, “Her Anonymous Life,” 1965.
    [2] Doe, “His Anonymous Life”; Roe, “Her Anonymous Life.”

    Would I be right that the new behaviour is correct?
  • yes, new behavior is correct.
  • Down to 20 failures in the test suite now. The structure that prompted this rewrite is now passing.

    Small steps, but should be ready for testing pretty soon.
  • edited May 31, 2014
    Here is a test item that is mostly theoretical. The old output looks like this:
    Doe. “Book A.” 1900.

    The new version is producing this:
    Doe. “Book A”. 1900.

    It's been set up to avoid migrating punctuation across a style boundary, so it balks at the italics. For a period, this isn't a bit deal, but it would be cleaner to stick with that general rule, and handle anomalies like this by adjusting the CSL. Will that be okay?
  • hmmm - this is relatively rare, of course, but the period should indeed be within the quotation marks, i.e the old behavior.
    If that's the price we have to pay, I'd say that's OK, but the anomalies that are going to occur are going to be item-specific, not style specific.
    Concretely, the issue is going to be almost exclusively
    Doe. Book Title “with a quote at the end.” 1900.
    If I understand the logic correctly, the probably more common
    Doe. “Article Title Ending with Foreign Term.” 1900.
    would be OK, right?

    I'm pretty sure that books titles ending in quotation marks are exceedingly rare, so, again, I'd be OK going ahead with this, though if its an easy fix the old version is correct.
  • edited May 31, 2014
    An italicized foreign term inside quotes will work okay, but there is a small difference between the "blocking" and "non-blocking" choices. If migration of punctuation is blocked at a styling boundary, the format of punctuation characters is not affected by the format applied to field content (example set large-ish and with commas for illustration):
    Doe, “Article Title Ending with Foreign Term,” 1900.

    If migration is permitted, the format of punctuation will shift depending on the field content:
    Doe, “Article Title Ending with Foreign Term,” 1900.

    The effect will be more or less visible depending on the font applied.

    The first example above (normal typeface) seems correct to me; but if we block on all styling boundaries, migration will fail in the other example. With migration blocked, it looks like this:
    Doe, Book Title “with a quote at the end”, 1900.

    With migration allowed, it would look like this:
    Doe, Book Title “with a quote at the end,” 1900.

    To my eye, the first example (with blocking) sets off the title more clearly. On the other hand, it does look like Chicago does apply a consistent rule that all punctuation must fall inside adjacent quotation marks, will-ye nil-ye; and my own judgement could be affected by exposure to programming -- I see the commas here as structural markup, and it looks better to me if they are consistently formatted.

    Anyway, it's your call. We can run with the first example in each pair, or with the second. (Other combinations would be difficult to implement, since some of the logical structure of the citation elements has been lost when the punctuation adjustment takes place.)
  • You're intuition is right—the first example above is correct (CMoS 6.2). This used to be different and that's still permitted, especially for print-only publications (CMoS 6.4), but the reason against it (and for the change) is very plausible: in electronic documents, the title may be separately tagged and shouldn't include anything that's not actually part of the title.
    In the second case, yes, the first example is unfortunately incorrect. Let me think about this.
    any my own judgement could be affected by exposure to programming -- I see the commas here as structural markup, and it looks better to me if they are consistently formatted.
    you'll be happy to learn that Chicago has separate comma rules for computer code (seriously! there's a reason I'm so fond of the manual) for exactly that reason.
  • (If the first example in the first pair and the second example in the second pair are desired, let me know, though. The code is pretty clean now, and it might be possible to finesse it.)
  • (We crossed in the post. On the second pair, if we want the second choice, how should the comma be formatted? It's an awkward case, given that the quotes and the preceding text should logically be italicized.)
  • I couldn't find anything on this. I'd put the comma in italics for the second case. I don't think it matters greatly, but going back and forth between roman and italics seems crazy. (Since it doesn't matter greatly, you can also do the reverse if it's much easier).
  • One more question in this series (sorry for all the back-and-forth, but while we're here ...).

    Should a comma following a title that is set in italics also be italicized?
  • no. That's what I was answering above—that used to be the rule, but is no longer. Only the title itself should be in italics.
  • edited May 31, 2014
    Yes, that's what I thought. So we want to produce the following results.
    (1)
    Doe, Plain Text Title, 1900.

    (2)
    Doe, Plain Text Title with a Foreign Term, 1900.

    (3)
    Doe, Plain Text Title “with a quote,” 1900.

    (4)
    Doe, Italic Title, 1900.

    (5)
    Doe, Italic Title with a Foreign Term, 1900.

    (6)
    Doe, Italic Title “with a quote,” 1900.

    (7)
    Doe, “Quoted Title,” 1900.

    (8)
    Doe, “Quoted Title with a Foreign Term,” 1900.

    (9)
    Doe, “Quoted Title ‘with a quote,’” 1900.

    The only exception to the general rule that punctuation does not cross a style boundary would be in (6).

    We can make that happen by drilling down the trailing children of a styled node, and triggering an exception if a quotes attribute is found among them. Field content (including quoted spans) is expressed in node form at this point of processing, so that should provide a general solution (within citeproc-js).

    Will that be okay?
  • this is the most colorful forums thread ever!
    Yes, these are all exactly right, that's wonderful.
  • The revised processor is up now. You can take it for a test ride by installing the processor patch plugin (works only with Zotero for Firefox).

    It should produce results like those highlighted in green in this thread, pretty much regardless of the underlying structure of the style CSL. The code for punctuation and space duplicate suppression, terminal punctuation merging and quote swapping has been completely rewritten, so there may be some glitches; but the processor passes all but three of the 990 fixtures in the test suite (the failures being minor things that were failing before the rewrite, for separate reasons).

    Let's hope it works. :)
  • thanks! Looks great so far.
Sign In or Register to comment.