Should there be a: No given-name disambiguation default style?

All default styles with (Author-Date) format have the "given-name-disambiguate" option, that is they put first names into citations when there are 2 authors who share a surname.
e.g.
(J. Smith, 2004)
(F. Smith, 1995)

There are at least 9 lengthy posts on this issue, and many users seem confused by the default, and put off by the more technical solutions that lie within the CSL code.

First two observations:

1.Most automatic data entry from multiple sources yields inconsitent first names, either because of journal conventions (e.g. http://forums.zotero.org/discussion/4900), use of human-entered data repositories such as amazon, or good-ol human error. There's no easy way of correcting names as yet (the author list view and batch editor will help a lot when they are ready).

2. The style gallery is somewhat unwieldy, and the Preview text pop-up does not tell the whole story. (see also my suggestion here: http://forums.zotero.org/discussion/7456)
Thus to find out the kind of disambiguation, you must install a style, then test it with more unusual combinations of your data via the chrome preview pane. This is not obvious, and will likely put some people off.


There seem to be two real bugs, and then I've a suggestion.

Bugs:

1. in APA, authors which are not listed first still disambiguate,
This is discussed here: http://forums.zotero.org/discussion/5323
and makes the intuitive solution, @adamsmith here: http://forums.zotero.org/discussion/3129 not really work.

2. in the CSL, if you get that far, you have to delete the
<option name="disambiguate-add-givenname" value="true"/>
line. Changing to "false" does not work - any string will trigger the first names
(same for all 3 disambiguate options)
(Discussed here: here http://forums.zotero.org/discussion/6142)

It is probably too much to ask many users to get all the way to point 5 on StephenHero's solution (1st April 2009) here http://forums.zotero.org/discussion/3464, if they find it at all.


So I'm proposing in the interim that a default (Author-Date) style is distributed which does not use given names in citations. Even APA is probably better without it until the bug (1 above) is ironed out. The chance of genuine ambiguity is less than the chance of annoying given-name issues.


Hopefully the bugs can be fixed someday for those that do require first-name-disambiguation, and can be bothered cleaning up all the first names in their library (not me).

In the mean time, am I missing anything?

Can anyone clarify the issue further?
Any other possible easy solutions for now?
«1
  • @komrade,

    Thanks for raising this. In the new CSL processor, I would like disambiguation to work as smoothly and reliably as possible, and I need all the clarity I can get at this stage. Setting aside the current behavior in Zotero, I would like to have feedback (from everyone with access to style manuals, really) about what is meant to be disambiguated when a name is extended with initials or given names. Specifically, there are two possible things that this functionality could be doing:

    (a) It could be disambiguating cites. In this case, no initials or given names would be added if an in-text cite can be matched unambiguously to a bibliography entry. If things are interpreted this way, a work by Mickey Mouse published in 1928 and another by his loving rodent wife Minnie in 1932 would be cited as (Mouse 1928) and (Mouse 1932), without adding their respective given names. This is how the new processor code currently behaves.

    (b) It could be attempting to disambiguate names. In this case, any surname that occurs more than once in the bibliography would be extended, in an attempt to establish the identity of individual authors. So the rotentian works referred to above would be cited as (Mickey Mouse 1928) and (Minnie Mouse 1932), respectively, even if they were the only works cited by their four-legged authors. This appears to be the behavior of the current Zotero CSL processor.

    I'm not sure whether one or the other behavior best reflects the requirements of style guides. In fact, I don't know whether some style guides require one approach, and some the other. This is a very important item to investigate and settle now with feedback from the community at large, and I would welcome followup posts to this thread by people able to check this against existing style manuals. Where guides are not clear on whether (a) or (b) is intended, I will assume (a) because it is more concise and serves the purpose of guiding the reader to the cited source. But if there are guides that unambiguously (!) require (b), it will be better for everyone if we can get that on the table sooner rather than later.
  • edited June 12, 2009
    Frank, I think the answer is both a and b.

    I think komrade's point is basically sound, but would just point out that CSL is not zotero-only. Seems a bit strange to be fixing styles to work around bugs in a particular implementation.

    FWIW, I do think contributors should be full objects in Zotero and thus that there should be a mechanism for users to merge them. But it's not the most trivial feature in the world to implement.

    Perhaps a shorter-term solution is to change the processing code to be a little smarter/more liberal; maybe only comparing the first character of the given names? It seems the most common problem is where a name in one item is initialized ("J. Doe") and in another, it's not.
  • There are two threads to the discussion here, I think. One is the problem of normalizing names in Zotero, and that's probably best left as an issue to sort out in Zotero proper, through batch editing or introducing some sort of identity mechanism. That's kind of outside my remit, although I certainly agree that it will be great if tidying up the database can be made easier.

    My concern is with what the processor should do when it is presented with clean input under the disambiguate-add-givenname option -- and I'm genuinely unsure. The (a) and (b) behaviors are alternatives. If you do (b) for all surnames that can be discriminated by adding a name or an initial, then (a) will have no further effect -- if there is only one Smith cited, that name would always expand with the same initial or first name, so it doesn't change anything.

    So I'm wondering if some styles require (a) and some require (b). If so, there need to be separate CSL options for the two cases. If not, then the existing option should be tuned to do whichever thing represents universal practice.

    Users can help the cause a lot at this point by checking style guides on this issue. A lot of the confusion over this will evaporate with a system that just does the right thing ... but we need to know what the right thing is in order to get there!
  • OIC; yes, some research would be helpful. There is admittedly some ambiguity in the option.
  • komrade confirms that the APA rule is about discriminating authors, not cites, here.
  • edited June 13, 2009
    I'm currently armed with about 8 kg of Style manuals, and 'easy guide to' manuals, and I'm trying to sort it out!

    The easy guides usually do not discuss enough special cases to make the rules clear, and the official style guides are often ambiguous as well.

    One thing we need is a list of works whose 'to spec' formatting would give us the answers.

    Something like:

    1. Mouse, Mickey D. (1928)
    2. Mouse, Minnie Z. (1932)
    3. Mouse, Frankie J. (1978)
    4. Mouse, Benjy P. (1978)
    5. Duck, Donald & Obama, Barack H. (1930)
    6. Obama, Barack H. & Duck, Daffy (1940)

    Any help with this list welcome!

    I'll report back shortly with more from the style guides.
  • In the Chicago Manual of Style, initials are added to in-text keys to discriminate authors. So this appears to follow the same logic as APA, but without the restriction to primary author. From 16.108 of the CMS online:
    Where two or more works by different authors with the same last name are listed in a reference list, the text citation must include an initial (or two initials or even a given name if necessary).

    (C. Doershuk 2000)
    (J. Doershuk 2001)
    So it looks like the remaining item to pin down is whether minimal addition of initials (i.e. behavior (a)) is required by any style guide. If anyone needs to have that preserved in the processor, now is the time to let us know ...
  • @komrade. There is a standard format for processor tests that we can use to build the list you need. Inspired by your plunge into the style guides, I'll set up a set of tests this evening. Then when we decide which behaviors are needed, we can just plop the appropriate tests into the suite after agreeing the option syntax on the xbiblio-devel list. After that, it's probably just a couple of hours of scratching around in the processor code to make it do the right things.

    I was a little take aback that I'd gotten this wrong in the code, but I'm feeling better already. Thanks for this work!
  • So if this is a real issue and there are cases of both examples, then that suggests changing the boolean value to an option list.
  • edited June 13, 2009
    Hi Bruce and Frank,

    Here's what I've come up with:

    APA Style guide, 5th ed. (p.211 sec.3.98):
    Disambiguate primary author only, use all initials, separate initials with ". "

    MLA Handbook 5th ed. (NOT MOST RECENT) (p.205 sec.5.2)
    Disambiguate all authors, use first name.

    Chicago Style Manual 15th ed. (p.597 sec.16.11 and p.621 sec.16.108)
    Disambiguate primary author only, use first initial, then more initials or given name if necessary. (see above).

    CSE manual, 7th ed. (p. 495)
    Disambiguate primary author only, use all initials, not separated

    I don't have access to the new AMA style guide at the moment, though I could get one in a week or two. Maybe someone Medical has one?


    Some issues:

    The Chicago guide is ambiguous. I would take the quote above to mean primary author. "Listed in a reference list" suggests primary author only to me, but Frank took it the other way. It is not clear enough, and I guess the editors might respond to a query? Their FAQ (http://www.chicagomanualofstyle.org/QA_submit.html) is very active.
    (minor thing: separator between initials is also not clear).

    APA is unclear about what to do if the initials are the same. Full names?

    MLA is unclear about same first name, different initial (though this is fairly unlikely).

    ---

    Frank's behaviour A (disambiguate by citation only) is not specified by these 4 major style guides.
    However, this one makes the most sense to me, and several other scientists I have asked! I think if possible it should be retained as an option for users and publishers.

    I haven't found any journal style guides yet that give enough examples to determine their rules. If anyone can help with this, that would be great.

    I think really the publishers are the only ones who can sort this out. If a sample list which would discriminate all possibilities (something like the Mouse / Duck examples above) could be sent to them for their official formatting, that would probably be ideal.
  • edited June 13, 2009
    Now on to the Zotero issue!

    I agree with Bruce that the most common problem is likely to be with full names vs. initials, with middle initials absent or present.

    Thus if "J. Doe", "Jeff Doe", "Jeffrey Q. Doe", "J.Q. Doe" (and even "J. Quentin Doe")
    could all be combined, by only scanning the first letter of the names, then I think that would be a solution in the short term.

    Of course if the easy merging authors or easy batch editing tools are going to be ready soon, then this will become unnecessary!


    Another issue is the auto-complete for authors retaining everything, even if you've deleted that version of the name. I assume this has come up before, though I can't find any discussion except this recent comment (http://forums.zotero.org/discussion/7371). This makes it much harder to be consistent when doing manual input or one-by-one editing.
    EDIT: Has this gone away in 2.0? I can't replicate the issue since I upgraded? Or was I just imagining it?!
  • @komrade, this is really helpful! Looking back over earlier discussions, I was reminded that the minimal (by-cite) disambiguation algorithm had in fact been specifically requested by a user. It's a relief to know that others also think that is sensible behavior (we're not supposed to grow attached to code, but I do like this one :). So this will stay, although the syntax for invoking it may need to change.

    It's also clear that we need to support both primary and global disambiguation of author names. This actually isn't too difficult to stitch in. I'll be busy with the day job and some other matters for awhile here, but it should be sorted in the next few weeks.

    So we're all set on the processor side. I'll leave the database entry issues for Dan Stillman and the rest of Team Zotero to comment on.
  • I may be in the wrong place; this seems to be a programmer discussion rather than a user forum -- but it's the nearest I've found to a discussion of my problem.

    I'm about to submit my PhD thesis, written in OOo Writer 3.1.0 using Zotero 1.0.10 (which, btw, I love in comparison to the clunky EndNote) and FireFox 3.0.13, under Windows Vista Home Premium.

    The appearance of given names in citations for disambiguation seems fairly random (i.e. sometimes they don't appear, sometimes they do). I'd like to get rid of them altogether. It's simply not necessary (in my view) to distinguish in citations in the text between Max Weber and Eugen Weber (and a host of others); if the reader wants to check the source he looks in the Bibliography and there is given the full name and the rest of the details. If there absolutely has to be disambiguation then initials could be used in the citations, but for most purposes even they aren't really necessary.

    Of course, some academics and other authors will disagree, so yes, it should be switchable in a very simple way, in future versions of Zotero.

    But for now, as a humble user, I've tried to make sense of the discussion above, but have a desperate question: how can I get rid of the first names? Pretty please?
  • edited August 9, 2009
    It's perfectly OK to open new threads with new questions (the button is on the top left).
    Which style are you using?
    If you feel comfortable making minimal changes to your .csl
    you'll just have to delete this line:
    <option name="disambiguate-add-givenname" value="true"/>
    Here is a quick step by step on how to make those minimal changes:
    http://forums.zotero.org/discussion/5104/modifying-word-plugin-using-journal-abbreviation-instead-of-publication-name/#Item_2

    btw. many styles use initials instead of whole names.
  • Thanks for getting back to me so quickly; that's good of you.
    I'm using Chicago Manual of Style (Author-Date format) -- sorry, I should have said.

    Okay, I think I can handle this. But before I do, can I just make sure that it won't do anything nasty to my thesis? <agonised rictus grin>

    Also, once I've done this, will a simple click of the Refresh button in each chapter make the necessary changes throughout? (Once all the chapters are finalised I thought the easiest thing was to stick them all in one huge file (100,000 words) before generating the bibliography.)

    And thanks again.
  • Hey - you should make sure to install the no-disambiguation style as a new style, not over the CMOS style.
    Please read my instructions in that thread carefully, particularly the part about changing both title and id of the style.
    If you leave those as they are, Zotero will automatically revert them to the old one (the one in the repository) about every 24hs.

    Then you select that style in the Zotero Document Preferences in your Ooo Plugin. You might not even have to click refresh.

    I don't see how this could do anything to your document, no.
    But why don't you just save a backup copy of each chapter. One can never have enough backups of a dissertation. But really, I wouldn't be concerned.

    And yes, keep them in chapters and only create the bibliography at the very end.
  • Hi Adam

    You have been so helpful. Thank you.

    I had one concern. You said to change the <title> and <id> lines, but didn't mention the next line, in this case <link href="http://www.zotero.org/styles/chicago-author-date"/>. I didn't change it, but should I have done?

    I followed the instructions, including changing the notepad extension (and learning from your previous student, getting rid of the intrusive .txt) -- but Initially it didn't work. I closed FireFox and OOo and reopened them, I clicked Refresh, I even positioned the cursor over one of the offending citations and clicked Refresh. I opened Zotero in FF and changed the document preference. Still nothing. Then, yes, I did what you'd actually told me to do and clicked on Set Document Preferences _within one of the chapters_ -- and it worked instantly, and for all the chapters.

    If this is the usual quality of help given on this forum to users with problems, that's yet another reason for my recommending Zotero to everyone!

    Thanks again.

    David
  • I'm pretty sure it doesn't matter, but why don't you change it all the same:
    If you drag the changed file to FF, it will ask you something along the lines of "do you want to install the updated file" you click yes and that's all you need to do.
  • Okay, done.
    Let's hope that tomorrow it remembers all of this...

    Thanks again.
  • I just checked with the editorial staff at my UN organization (ECLAC), and they require an initial *after* the last name, e.g.:

    (Goldin, I., 2007) and (Goldin, C., 2007)


    How do I change my .csl to move the initial after the last name?
  • it's not easy - I though you could just include a "name-as-sort-order" option in the author-short macro, but that doesn't work.
    There is a
    <if disambiguate="true">
    possibility, maybe you could work with that? I don't have time to try that out right now.
  • edited August 10, 2009
    @adamsmith,

    Did you run your change past a validator? Some other styles use this combination, and it should work. There are a couple of possible gotchas. It goes on name (not names), and takes "first" or "all" as arguments (not "true"):

    <name form="long" initialize-with="." name-as-sort-order="all"/>

    EDIT: My bad! I've now changed the form in the example above from "short" to "long", and added an initialize-with attribute. With "short", there is no given name in the citation, and so nothing to initialize, nor anything to arrange in "sort order". The initialize-with attribute causes the full given name to be expressed as an initial. (Could have benefitted from validation myself!)
  • no I didn't validate, but I'm pretty sure I did this right - I just copied and pasted the name-as... in from the author macro. Chrome didn't behave strangely and I think the style was fine but -
    it just didn't display the disambiguated names correctly in the chrome panel (i.e. they still had initials up front)- but it's possible I'm wrong, I just checked briefly.
  • @adamsmith
    I see the author-short macro, but there is no option for placing the initial. That seems completely controlled by the engine, not the style. Am I wrong? Is there a short version of first name that I can place in a group?
  • @brazuca, I messed up with my earlier suggestion. I've fixed it now; please take a look at the sample code and explanation above, and see if that doesn't get things running for you. Iniitals are slightly tricky in CSL: you need to ask for the long form of the name, and then specify that the first name should be initialized instead of spelled out in full. Apologies all around for any confusion caused.
  • edited August 11, 2009
    Hi all,

    just like DVBQQ, I ended up here by lack of alternatives.
    I'm trying to make sense of the disambiguation feature to modify the American Geophysical Union style for a paper I'm writing.

    and it looks like my problem might feed your discussions

    here it is:

    say I have the following works:

    Mouse Mickey, Mouse Minnie, 2005
    Mouse Mickey, Criket Jiminy, Mouse Minne, 2005
    Mouse Mickey, Duck Donald, Mouse Minnie, 2005

    then, I would like them to be cited respectively as

    Mouse and Mouse 2005
    Mouse et al, 2005a
    Mouse et al, 2005b

    this is quite common, at least from what I've seen in AGU journals, Copernicus journals (still have to create this style...) and so on.
    it is common, but tricky, as it points at a unique reference in the bibliography only at the expense of adding the same letter (a, b, ...) after the year field in the bibliography.
    (I might be completely out, but is it not the purpose of the "locator" variable ?)

    anyway, right now, it comes out as:

    Mouse and Mouse, 2005
    Mouse, Cricket et al, 2005
    Mouse, Duck et al, 2005

    which gets busted by the editorial office of the journal

    so, coming back to Franck's (a) or (b) scheme, it looks to me like scheme (a) is the one I need for at least 2 major publishers in earth sciences !

    I hope this is useful !
  • dda-gre,

    That should work okay in the current processor, actually. I checked the American Geophysical Union style file in the repository, though, and it has an error (well, an infelicity) that is probably messing things up. About three-quarters of the way down the listing, you should find these lines:
    <option name="disambiguate-add-names" value="false"/>
    <option name="disambiguate-add-givenname" value="false"/>

    The current processor has a bug that takes any value given to those options as "true". To set them to false, remove both lines, save the file, reload it into Zotero and restart Firefox. That should get the result you're after.

    Let us know if it works; if you get output suitable for submission, let me know and I'll make the change in the repository.

    Frank
  • oops, I had edited my post before reading your answer...

    I did as you suggested, and it works just fine !!!

    thanks a lot for your help. I'll keep you posted when I get my proofs.

    dda
  • @Frank -
    but with your code for brazuca, what he would get would be initials after the last name for all authors, but what he wants is initials after the last name only for disambiguation, right?
  • @adamsmith: Gyrk. Conversing about disambiguation issues is like riding a rollercoaster blindfolded. Without a seatbelt. Wheeee ...

    Umm. For brazuca's case and that example, that's right.

    In the specific style referenced by dda-gre, disabling the two lines will produce just the family name, because disambiguation is totally disabled, and the names macro is creating only the short form (family name only), so there's nothing to initialize.

    If the governing rule is that disambiguation with initials should be tried first as a means of disambiguating the individual reference, falling back to year-suffix if and only if the reference is not uniquely identified, then Zotero currently can't quite do that. Disambiguation with given names disambiguates a pool of names, currently. In dda-gre's example, turning on given name disambiguation might put initials after both Mouse names (depending on whether the processor retains initials even if they fail to distinguish the individual, even if they're a rodent in the offline world).
Sign In or Register to comment.