[citeproc-js] 't name particle results in extra space
There's an extra space added before the
citeproc is also not picking up the particle as a non-dropping particle when sorting bibliography (in Cell style for example) and sorts it to the very beginning because of the apostrophe.
I'm also not sure if the apostrophe is getting replaced and whether it should be doing that.
't
particle when citing the following article: http://www.nature.com/nbt/journal/v31/n11/full/nbt.2702.html (see first author "Peter A C 't Hoen")citeproc is also not picking up the particle as a non-dropping particle when sorting bibliography (in Cell style for example) and sorts it to the very beginning because of the apostrophe.
I'm also not sure if the apostrophe is getting replaced and whether it should be doing that.
The CSL archives have a list of name particles contributed by Charles Parnot of Papers. It is not a complete list, but does show "in 't" as a dropping particle. Out of curiousity, is 't a dropping or a non-dropping particle?
It's the abbreviated Dutch neuter singular form of "the" (unabbreviated form is "het"). See http://www.dutchgrammar.com/en/?n=NounsAndArticles.03
Edit: changed my mind.
Frinkle, B
Horvath, P A B in ’t
Horvath, P A A ’t
In ’t Horvath, P A D
Klabdaggit, M
’t Horvath, P A C
Vooz, B
This follows the CSL 1.0.1 specification. We may have discussed it already, but looking at this on the page, I wonder whether maybe either the second and third entries should be reversed, or the display position of the dropping-particle should be adjusted ... ?
In the sort question, the name entries behind the test set the particle in the family field (non-dropping) for P.A.C. Horvath and P.A.D. Horvath. It's set in the given name field (dropping) for P.A.A. Horvath and P.A.B. Horvath.
(Rintze and I have had an off-list exchange about this one. We'll keep an eye on the use case, and wait for further evidence on sort vs display conventions.)
non-dropping particle
dropping particle
Read it and weep. :-)
From a quick look, I think the regexps are a bit too relaxed and there doesn't seem to be any additional checking of what they match, so I would like to see the list of known particles that we are trying to match here. (I assume they are in the test suite. I remember you were adding some Arabic particles recently, which are not in Charles' list)
Another comment I have is that some of the regexp is not entirely correct. Or, rather, is not matching what I'm sure you intended. E.g.
[-|\s+|\'\u2019]
I think you probably wanted something like(?:[-'\u2019]|\s+)
instead, because(/[-|\s+|\'\u2019]/).test("+") == true
Thanks for catching that regexp bug. It looks like I was using group matching, and then changed it to a character match without thinking it through.
For reference, I've prepared a test that covers the particles in Charles' list, plus the 't case: name_ParticlesDemoteNonDroppingNever. The result segment of the test shows what the processor actually produces in the current release, which is not entirely correct. I've included a note of the particles that it gets wrong by the requirements of Charles' list -- some of the items (Saint/Sainte) may be open to discussion.
With display-and-sort, we would still need the markup to identify failed parses of the dropping-particle part, but if you find that sequence easier to read, we can use that for initial testing instead. Just say the word and I'll replace the test.