Bulk standardize author names

One thing I find I do quite a lot of in Zotero is going through multiple entries to standardize author names, so that they'll sort nicely in the library pane. For instance, changing articles by Smith, H and Smith, Harold and Smith, Harold C etc. so that they are all authored by Smith, Harold C. It would be nice if it were possible to multiply select all of the entries in which one of the authors is the same, and then be able to standardize the Last, and First name for that author.
  • There is a project to add an additional pane to the Zotero interface, listing all authors; click on an author and items with that author as a creator are shown. It allows you to select an author and rename it, which renames all instances of the author (I think of it as like browse mode in iTunes, with a very limited batch rename system). The patches for this were made back in October or November, but the author will hopefully be releasing them soon, and we can get them into a Zotero plugin or the core code.
  • Something similar with journal names would be great

    johan
  • I think that the interface could be pretty easily modified to handle other fields as well. For data standardization, journal names are an obvious candidate, while for navigation many of the fields, like date or library catalog, could be useful.
  • An additional pane with a list of authors is a feature in Mendeley that I always wished was in Zotero as well... (In Mendeley it's actually a filter by: Authors, keywords etc).
    Of course, it would also be great to be able to easily standardize the authors names, i think it's a fairly important thing missing from Zotero.

    I totally add my voice to this feature request !
  • @ajlyon Any news on this? Renaming Authors by hand on a large database is very tiring ...
  • batch editing is firmly planned for the next major release of Zotero, but that won't be any time super soon.

    I'm actually not sure what ajlyon was taking about with the other feature - while that'd be nice, I can't find said patch anywhere.
  • In response to the original suggestion by JonEP, you should be careful about batch editing different versions of the same author's name to one version. The author may have used different versions of his or her name on different publications, e.g., with and without the middle initial. For citation purposes in new publications, these differences should be maintained.

    I'm sorry to hear that batch editing won't happen any time soon. This is one of the major limitations of Zotero compared to EndNote and Mendeley.

    Steve Jenkins
  • According to Simon (zotero senior dev): « The next major release will be available sometime around November 20. A beta release will be available a month or two before. »
    http://forums.zotero.org/discussion/24552/no-items-selected-19-files-in-folder/#Item_5
  • @shjenkins

    I basically agree with your caution. However, the situation is not straightforward. Sometimes the author name that is imported into Zotero differs from that on the document. Some sources of metadata provide more complete author names than others. Data from Google Scholar, CrossRef, etc. often has the middle name or initial dropped. Sometimes they drop all names and substitute one or more initials. Sometimes on the publisher's own website, the spelling of an author's name is different on the pdf version of the document from what is on the web (html) version.

    While everyone should fully read everything they cite, it can be tedious to check every author name on every document to be sure that it is exactly the same on the "page" and in the database. I find it especially frustrating when the record with differing names was downloaded directly from the publisher.

    Sometimes the database will make changes to the spelling of a name. For example, take the German "ü" I have seen that changed in the database to an undecorated "u" or to "ue". I have also seen author names spelled with "ue" on the page but in the database this becomes "ü"! The Web of Science seems to do this fairly frequently.

    This becomes a problem with disambiguating author names in a bibliography. Should an author's name (versions that include one with one initial and another with two initials) for publications in the same year not be disambiguated but presented as though the papers were authored by two different people?

    At least one other thread discusses this issue.

    http://forums.zotero.org/discussion/22722/improved-search/
  • In response to Gracile, the other thread that indicates that the next major release will be out in November doesn't say that this release will include global editing.

    DWL-SDCA makes good points. Inconsistency in how journals and other sources record metadata is frustrating. Is it worth eventually having two name fields for Zotero, one for a standardized name that would represent all the records by the same person, the other for the name as used for a particular published item? This would make it possible to keep track of publications by people who use different names at different times in their lives, as a consequence of marriage for example. I don't know that such a feature should be a high priority for Zotero developers, however.

    Steve Jenkins
  • a standardized name that would represent all the records by the same person
    You might be interested in reading about ORCID, which is set to launch this fall: http://about.orcid.org/
  • In response to Gracile, the other thread that indicates that the next major release will be out in November doesn't say that this release will include global editing.
    It will.
  • Re: ORCID

    From the ORCID website: "Individuals will be able to register for an ORCID identifier at launch in October."

    ORCID is working with publishers and operators of literature databases to help solve the problem of non-unified author names. However, it isn't clear how soon this project will attain sufficient mass to be useful for contemporary authors. There is even less certainty about authors who are no longer living or no longer contributing to the literature. Even less clear is how authors will be able to attach their identifier to their articles that are already listed in publisher or database lists. Clearly, ORCID has great potential. However, as an early member of the project (through the literature database I curate) I don't feel that the project has much momentum.

    I have read all of the ORCID material and have talked with some of those who are actively involved in the project. The cross-publisher issues -- especially for inactive authors -- are still to be resolved. I also have concerns about accuracy-testing to assure that author IDs are attached to the proper documents.

    All of us will need to help if ORCID is to be more successful than the other attempts to address author name issues.

    When ORCID is implemented it may also help with identifying author affiliations in a standard way.
  • Are there any news on this matter? Would love to bulk standardize author names and editor names
  • Also interested to know whether there is now a way to batch-process library items, so as to more quickly homogenise author naming. Until then, one has to manually change all occurrences of e.g. "S. Smith" to "Steve Smith", so that these don't appear as different authors in citations.
  • Currently there's only a code-based, fairly elaborate option: https://www.zotero.org/support/dev/client_coding/javascript_api
    As you can tell, the plans for batch editing were delayed significantly, but while I don't have an ETA, we're definitely getting closer.
  • That's good to hear. The API will do for now - thank you!
Sign In or Register to comment.