Combining authors

One of the problems with importing citations from lots of different sources is the different styles that get imported. This is particularly prescient with authors. For example, I may have J. R. Richardson for one paper and John R. Richardson for another. Now, I know that these two are the same person, but it would be nice to be able to tell Zotero that. And this is important not only for searching but proper citation...one format may require Richardson, J. and another Richardson, John R. If I only have the initial in a citation then I have to do some checking and change it to the full name in order to facilitate the latter style.

Sure, you can probably get different permutations by searching and then changing them to the same thing, but you also might miss some. And that's a big task -- in order to check my entire library, I'd have to search every name.

I wonder if there's a more elegant solution to be had. Maybe some kind of listing of all the items in a particular field, with the ability to highlight multiple instances and easily change them to the same thing. More difficult would be the route of not changing anything, but setting up an internal link so Zotero knows they're the same thing. That would allow cases where there's two John Richardsons but I happen to know that they're not the same person. (Of course, I'm not really sure this last case is even relevant from a citation perspective, but maybe someone could think of a use case where this is important.)

It also occurs to me that this is an issue for other fields that should be consistent across citations, like Journal Abbr (J. Mamm. and J. Mammol. are both the Journal of Mammology) or even Language (French, FR, and Francais are the same).

Anyway, I thought I'd bring up the issue and see what others think. Sorry if this request has been discussed before -- I didn't see it in a few quick searches.
  • To add to this: The issue is also an issue in bibliography generation, even when a single user has created all his citations from the grounds up. Continuing with amc's example author, let's say that Mr. Richardson has been inconsistent in putting his name on his publications. Sometimes he has published as J. R. Richardson, sometimes as John Richardson, sometimes as John R. Richardson. As I understand practice in most citation styles, the default move would be to follow the name as given on the publication. That is, if the item I am citing was published as J. R. Richardson, that's what I'd want in my bibliography, regardless of the fact that I know his fuller name.

    However, let's say I'm using a format like MLA or Chicago. In these styles, in your bibliography, you only provide the author's name for his first work in the list. For subsequent items by the same author, you replace the author's name with three dashes. So:

    Richardson, John R. A Good Book. Sydney: Good Book Press, 2006.
    ---. "Some Not So Great Comments." Lame Journal 12 (1999): 112-120.

    This currently works just fine in Zotero, except for a sorting issue that's being discussed elsewhere. That is, as long as the author's name is in the same format in each item in your database. If that's not the case, however, Zotero lists them separately. While my style manuals are currently loaned out so I can't look this up for sure, it's my impression that this shouldn't be the case. Rather, all the citations should be listed as above, with the name given in the most detailed form employed by any of the sources. (For simplicity's sake, it would probably be easiest to implement a system whereby they would simply be given by the most detailed form of the name available, to avoid problems in dealing with a case where we have John Richardson and J. R. Richardson, each of which has detail the other does not.)

    Aside from the labor issue, the disadvantage to changing all citations to a common form, as amc suggests, is that it doesn't preserve the ability to give the name as originally published where possible. And it does seem to me a bad thing from the standpoint of bibliography creation to change a name from the format of original publication, except where necessary to maintain clarity.

    There's an even more complicated issue with people who have published under more than one legal name -- especially women who have changed their name when marrying. I'm actually not sure how that's supposed to work, but even if publications under different legal names are to be listed separately, I can see advantages in Zotero's being aware that the author in the two cases is the same individual.

    At least a certain amount of this would have to be done by hand. For one thing, as amc points out, there might be two different individuals named John Richardson. And that is relevant from a bibliography generation perspective, as right now there's no way to prevent their works from being combined under the same name in an MLA or Chicago bibliography, which is incorrect. However, establishing non-identity of two authors seems even harder to me than establishing the identity of authors with different names. Where different names represent the same individual, there could be a field that allows you to provide the fullest form of the name available, according to which Zotero can then think of the entries together. But with two authors where the fullest available form of the name is identical, one wouldn't want to use such a field. The only thing you could do would be to add some identifying information to the name in that field, for instance, a number. But anything along these lines would mess up sorting. I believe in bibliography generation the author whose first listed work is alphabetically earliest should be listed first, and that would vary between bibliographies and across the life of the database. I think the solution for non-identity will probably rest on something that requires knowledge I don't have about how the Zotero database is structured.

    At best, I can think of some sort of pre-set search that will automatically try to find names that are candidates for being the same author and bring them to the user's attention; the user could then, perhaps, select all the ones that actually are by the same author, identify them as the same, and perhaps provide a long-form name. (This is similar to amc's suggestion of a listing for all the items in a given field, which would be useful, but it would also be nice to build some kind of search designed to help find candidates. It would make the job a lot less difficult in a large database.) Indeed, functionality along these lines would be useful in other situations, too, as when the name of an edited volume has been keyed slightly differently between different items.

    Actually, come to think of it, the hierarchical relationships that were being discussed here will obviously solve the edited volume problem whenever finally implemented, as we'll be able to treat all the chapters as children of the edited volume. I'm also not sure it's the most elegant approach since the parent-child discussion is centered around finding ways to talk about actual relationships between items. But perhaps, in conjunction with a general adoption of hierarchical support, it would be possible to treat authors hierarchically, so we could have a parent author with the standard version of the author's name we want used to resolve ambiguity, and the other forms of the name we include in specific item entries as the children of that standard form. This still wouldn't solve the non-identity issue, but it might be a reasonable way to treat identity, allowing us to link author names and resolve ambiguity without sacrificing fidelity to the name as originally listed in the publication.
  • edited December 6, 2008
    Amc's request is similar to my point (2) in this thread:
    (2) Is there a place where I can keep track of variants of names that refer to one and the same entity, and where I can clean those up?

    At the moment, no, but an author list view and batch editing are both planned.
    This was January 2008. In the intervening time a lot of work has been done on the sync version, so I'm not sure where we stand on this.

    Picking up on paulbee's point, there are actually three sources of variation: (1) repositories may cite the name differently (usually this is about abbreviated first names, but problems also arise with two-part surnames like van Leeuwen); (2) other authors may give the wrong variant of a name (Nick cites Bill Croft whereas Bill actually publishes under William Croft); or (3) an author may introduce different versions of her/his name (Nick may publish one or two papers under Nicholas).

    In the first two cases, you'll want to harmonize the variants. The name in the database has to be the fullest version possible; individual styles can then abbreviate first name if needed, etc.

    In the third case, it is not so clear what to do. While paulbee proposes to keep the different variants, linking them so they are somehow seen as one in the bibliography, I think that's a rather messy solution. Introducing an extra long name field only for those authors that have brought variants of their own name into circulation themselves seems overkill. If Nick publishes one or two of his papers as Nicholas, I'm not sure why bibliographies should be propagating that kind of variation.
  • In the third case, it is not so clear what to do. While paulbee proposes to keep the different variants, linking them so they are somehow seen as one in the bibliography, I think that's a rather messy solution.
    In my scientific field, authors (and titles btw) must be cited exactly as printed on the original source — even if they include unusual name variants (or typos in the title). AFAIK, the reasoning behind this is that a citation should always mirror the exact publication details of the resource so that one is able to find a real copy of it.

    So I think it is crucial to keep all name variants for a given author, and (ideally) link them as proposed by paulbee.
Sign In or Register to comment.