alphabetize records without A or The

2»
  • thanks. We'll leave numbers as they are (i.e. sorting at the top). If someone wants that changed, they'd need to make a case (in a new thtread) that there is a significant number of situations where that'd come to bear that would justify the pretty substantial amount of work and complexity required.
    So let's leave this aside and return to regular stop words.
  • Has there been a resolution to the stop words issue? This just came up here and I found this thread talking about it.
  • No. It's a hard problem to solve properly. It'll very likely have to wait for the next CSL version which may still be a good bit out.
  • edited March 5, 2017
    Librarians have been doing this for decades.

    All MARC (Machine Readable Cataloging) Records generated by Libraries the world over, include for each title field a code, called and indicator, which tells the program how many spaces to skip in the title field before alphabetical indexing begins. Thus, any title with The as the first word would have an Indicator of 4 so that the computer will skip the first 4 characters, including the empty space before the actual indexing begins.

    Your software could be designed to ignore the indicator in those instances when a Style requires the alphabetization to include leading articles (in any language).

    If you fail to add an indicator the leading article does not get skipped. People can be trained to do this. I was, 30 years ago.
  • Right, but I think we can all agree that we don't want Zotero to look like MARC records ;), nor is it an option to train every Zotero user as a cataloging librarian.
    This should probably happen in a fairly automated way. Possibly with a Pre-set list of stopwords that's editable. The lists probably need to be linked to the language of the entry. Then it probably needs to be possible to turn it on and off in citation styles. None of this is undoable, but it's not an easy problem.
  • edited March 8, 2017
    MLA treats foreign articles like their English equivalents (Die Leiden sorted under L)
    Allow me to add a further reminder that any handling of lists of stop-words / noise-words in a bibliography must be tied to the language tag of the item. ("Die" isn't an article in English).

    This reinforces the importance of standard 2-character or 2-2-character language abbreviations. This has implications for translators and my hope that eventually 3-character ISO abbreviations from PubMed and full-word language labels from some publishers will soon be automatically converted to ISO 631-1.
  • The citation processor (citeproc-js) recognizes an extended attribute on locale style-options for this purpose. The attribute, documented here, is active in Zotero, but it is not part of the CSL specification.

    (Styles that use the attribute should work correctly in Zotero, but because they will not validate, they will issue a warning when installed, and cannot be added to the CSL repository. They may also not work correctly on other platforms.)
  • I'm understanding correctly that that's already localizable? So it could do what DWL describes for MLA above?
  • @rmzelle -- I think we should consider that for the next CSL release. Requires some work on localization, but this would be really nice to have and if it's in the processor already... I thin Frank made exactly the right call on implementation here.
  • Thanks! Just realized that I sent an earlier response to "noreply." Here's what that was:
    ***
    Yes, it's tied to the locale, so it should adapt to the language set on each item (falling back to the style's bare locale if any, the style default-locale, or to us, in that order).
  • For styles that require alphabetization without articles, do they generally want to ignore all articles or only the articles for the Document's locale (so an English bibliography would sort "L'Histoire de la Psycholgie" under L, not H)?

    Is that something that varies across styles and so would need to be configurable in the style specifications?
  • Just for the record, German bibliographic rules require articles of any language to be ignored in alphabetical sorting. However, I can't tell if all German citation styles respect these rules.
  • My issue is with sorting works by the same author within a bibliography using CMS. Now that Zotero has enabled creating longer documents using MS Word and has resolved the display of dates, the only glaring error that I experience is Zotero's inability to ignore articles such as a, an, and the when alphabetizing titles. Even a simple solution that would address most but not all cases would be greatly appreciated. A partial solution is better than no solution.
    Zotero has matured greatly over the last decade, and I am thrilled with how well it has worked for me. Thank you so much to the support staff for all your help!
  • @adamsmith Is that something that might be possible for CSL 1.1 considering it’s already in citeproc-js?
  • edited October 6, 2019
    I had a simple version of this problem ensuring "The Open University" was sorted as if starting with an "O". If anyone is interested in a quick workaround for this, the following works:
    * Add "Original Publisher: Open University" to the Extra field in Zotero so this gets passed as the CSL variable original-publisher
    * Add this macro to my style:

    <macro name="author-sort">
    <choose>
    <if match="any" variable="original-publisher">
    <text variable="original-publisher"/>
    </if>
    <else>
    <text macro="author"/>
    </else>
    </choose>
    </macro>

    * Modify the bibliography sort to use author-sort instead of author

    <bibliography>
    <sort>
    <key macro="author-sort"/>
    <key macro="year-date"/>
    <key variable="title"/>
    ...


    This doesn't impact the way author is displayed, but whenever original publisher is set, this is used for sorting instead of the actual author. This could be used for any situation where the sort order should not use the author names as they are displayed.

    I hope this helps someone else until a more permanent solution is found.

  • (You'll need to wrap your code into code brackets <code>asdf</code>)
  • Stopwords won't work, even in a single language. In English, for example, there are titles such as "A to Z: Current Spinal Cord Injury Rehabilitation", "A 440 GHz balanced active frequency multiplier-by-four SMMIC", and "La 'Princesse de Cleves' and the refusal of love: Heroic denial or pathetic submission?". The rational and successful way to handle this is indeed basically the way the library world does. In Zotero, have a separate field "sort title", similar to "short title". We could either populate it with a best guess, or leave it blank by default. When a user runs into a problem, they can add the title dropping initial characters they don't want Zotero or their word processor to sort on to the sort title field.
  • The proposal for a separate field "sort tile" is simple and effective.
    It should be a very good solution for this problem.
Sign In or Register to comment.