String manipulation and splicing

I'm pretty new to CSL although as a part-time developer I'm more than fine with XML. I'm working with my client to work up some styles specific to their requirements. Rather enjoying it :-)

They have asked if it's possible for enforce the page format:

1742–1749 – is correct
e318–23 – is incorrect
1949–58 – is incorrect – this should be 1949–1958

Now if one was writing a bit of normal code, this *could* be handled but I can't see any mention of string manipulation in CSL like left, right, substring etc. Even if there was, I can see it been pretty messy and running code isn't really what XML is all about.

My take is that they need to correct the citation itself.

What I might be able to do is write a PowerShell program that identifies incorrectly formatted citations.

Thanks, Rob.
  • A follow-up question first, to pin down what we mean by enforcing the page format.

    Does it mean validation? The CSL processor in Zotero doesn't restrict the content of ordinary fields, although it tries to do the best I can with what it receives.

    Or does it mean page-range collapsing? CSL has several attributes for controlling that. It generally assumes a full representation on both sides of the range in the input. If collapsing rules are not being respected, we can look into it.
  • I'm going to assume it refers to the latter, i.e. page-range formatting. You want to use page-range-format="expanded" as described here: http://docs.citationstyles.org/en/stable/specification.html#page-ranges
    This should get the third example right, but I don't think it will touch strings with non-numeric elements, so it'll leave the second one alone.
  • More and more journals are numbering supplements' pages with an initial non-number character-sometimes more than one letter. BMJ Group sometimes publishes online supplements with page numbers that start with the letter "e". This practice is much more seen now than it was 4 or 5 years ago.

    Thus, if it isn't too difficult to program, expanded format would be nice to have for page ranges with non-numeric characters.
  • I think this is a combination of validation and reformatting. For example, should one be able to enter a page range of "23-2". This isn't my speciality area at all but my gut instinct is you are correct not to restrict fields and my client should (and would) pick this up in proof reading.

    Thanks for the heads up on "expanded" - that may indeed be just what the client is looking for.

  • Happy to check further, or provide info on the capabilities of the processor. Sample data would be very helpful to clarify the requirements.
  • The page-range-format="expanded" global tag really helps! They are importing a lot of existing citations from pubmed and possibly even Endnote where page entries are often "1970-72" and having these automatically changed to "1970-1972" will save a lot of editing.

    I must say that I always tried to steer away from Endnote & Refman but Zotero is wonderful! I do like it when the new kids on the block come up with a better product. For a fraction of the price! Plus cloud based from the start. With great support here. More power to your elbow! Now can you just please write a competitor for Photoshop so we don't have to pay greedy Adobe :-)
Sign In or Register to comment.