Merge inconsistently named authors

Haven't seen this being directly discussed here. I saw in the forums some people having trouble with name disambiguation and I also saw that this is addressed in the documentation, here:

I was wondering if a tool for merging inconsistently named authors is out of the question, much like we already merge duplicate items.

1. All items in which the (possibly) same author appears are displayed in a list;
2. user selects the items which belong to the same author;
3. user selects which version of the author's name they want to use be the canonical one;
4. Zotero changes all names in those items to the one the user selected.

Now that I think about it, the same logic could be used to merge any inconsistently written field, as long as sensible similarity measures are used for each field.

I understand there is a batch editing functionality planned for Zotero version 5.2, so if the developers believe that it will solve the problem I've described, then ignore this.
  • Batch editing would be able to address this, yes. My recommendation for now would be to fix this on a case by case basis as needed—when you run into an inconsistent name when citing, make a saved search for that author, and then use copy-paste to quickly go through the list of results and make them consistent.
  • I've been trying to solve this issue with a Python script, here are my results:

    I used a bib file I exported from my collection and found authors with the same initials, or with names that are only differentiated by the presence of accents etc. and tried to keep the most unabbreviated ones. I'll try to provide some executable or command-line program in the future, if people are interested.
  • I would recommend not exporting, but instead using PyZotero to edit your library directly via the Zotero API.
  • Thanks for pointing that out, I'll play around with it and see if something consistent comes up.
  • Hello,

    It is getting really frustrating. There must be a way to consolidate different formats of the name of same author to one single one. Is there anyway? This is really annoying that you can’t sort all the articles of ONE AUTHOR because there may be different combinations of first and middle names.

    Also, by creating a "saved search" folder you may face a new combination of the name of that creator when adding a new item, because you hadn’t previously considered this specific "new" combination.
  • I'm afraid there's nothing on this that isn't mentioned in this thread, sorry.
  • Hi there. Unfortunately, this is not a trivial matter to solve. I have come to the conclusion that this is not something that can be done automatically and some sort of GUI would need to be created in order to let users merge their authors.

    Some sort of similarity function may be concocted in order to ease the merging process, so that authors with similar names are presented close to each other in some sort of list, but ultimately, the user would have to judge each case individually.
  • Hey, thanks for your replies and I agree to some degree. But, let me clarify my points.

    1. As always, I want to emphasize that I believe Zotero should be seen as a knowledge management software. Otherwise, users or developers trying to think about managing references leads to more focus on "items" (e.g. articles, books, webpages, etc.) instead of knowledge. By knowledge, I mean a bigger picture that connects those dots that could have been impossible if you want to do that with the power of your mind.

    2. Thinking about the knowledge, it is important to capture the creators, because it provides the identity for the sectors of the mental map you have. This sectoral identity provides better understanding of the bigger picture, hence you can compare and relate different ideas (knowledge maps) together. This is critical for knowledge creation (particularly for those of us who write academic articles)

    3. Keeping in mind all of that, it is crucial for all of us to remember and remind who says what and who creates and proposes certain ideas. Therefore, it is necessary to view the ideas of particular creators in a nutshell.

    This is my point of view that "we should be able to observe the contents, materials, items, and ideas through the perspective of creators."
  • This is true, but when references are printed, they need to be formatted correctly. It's a separate problem.
  • Thanks. But, I can’t relate your comment. The point that I’m trying to convey is we need to view, organize, and analyze the content of our libraries from different points of views. One of them is about authors, meaning we must be able to scan every items created by author X.

    Another point of view could be from "Publications". There can be a section that you can only focus on the title of the publications so that you can handle different parameters. Have you ever noticed that sometimes also it is stated in this forum that we need to define the Journal Abbreviation for a specific publication? Also, there are some other questions regarding how to assign that particular Journal Abbreviation to multiple items that have been published in a journal.

    These types of issues exactly come from the fact that the focus is on the items rather that "creators" and "publications".

    Hope the developers and users understand my points.
  • At all, have you ever thought about how many "different" creators you have in your library?

    Another example:
    Have you ever faced this confusing situation regarding a particular author?

    1. James G. March
    2. James March
    3. James G March

    Although this has been resolved in many other applications many years ago, Zotero considers all these three examples as three distinct authors! This issue also creates inconsistencies in our bibliographies! Having said that, there should be way that I can review the name of the creators to see whether there are any types of these problems.

    Another example:
    Have you ever thought about how many "different" publications you have in your library? Besides, do you have any publication title that by chance has a typo? For instance, how can you find this anomaly if it exists in your library?

    1. American Journal of Sociology
    2. American Journal of Sciology

    I know that check spelling and spell correction can be a one way, but the best and systematic way is the ability to have an overview of all of your articles to see those bizarre situations.

  • While there's certainly more we can do here, particularly with authors, for most fields you can sort by that field in the middle pane and look for inconsistencies.
  • edited April 6, 2019

    Although this has been resolved in many other applications many years ago...

    I'd love to know your examples of software that does this automatically. Especially any software that can handle J. G. March, JG March, J March, Janice Giselle March, John Garfield March, etc.

    My own (non-Zotero) online bibliographic database has a logic system to help with the different but similar names / same author problem. It involves weights for identical coauthors, similarities between topics, and publication year proximity to arrive at a calculated probability of a match. I still need a human to make the final decision and do the edits. My system still has more than 14,000 names that are very similar to one, two, or three other names out of 860,000 total names in the database.

    edit: Add name variants, where for example a name can be published with a single German decorated vowel or an English 2-vowel spelling, (oe, œ, ö) and the decision becomes even more subtle.
  • While it doesn't transliterate yet, I've created an export translator that will export the double-metaphone associated with the author name. Sorting on that gave me quite a few interesting matches (so different name, same metaphone key). Also a pretty large number of spurious matches. This is not a silver bullet. I'm open to more sophisticated name-matching techniques. For transliteration, I need to set up something more complicated.
  • @DWL-SDCA thanks for your comments.

    Sorry, but there are some misunderstandings here. I don't remember that I said anything about the automatic correction –– please correct me if I'm wrong. The only thing that I want to convey is we have had this problem that we can't "globally" change different details of items at the same time. Again, think about the example above, also, consider that there are articles and items related to each one of them:

    1. James G. March ---> 38 items
    2. James March ---> 17 items
    3. James G March ---> 15 items

    Now, think about consolidating all the variations of names of author. In this case, I want to keep "James G. March" as the principal name and correct the others:

    James March ---> James G. March
    James G March ---> James G. March

    How can I do that? The only way is going through 17+15 items and make the changes for each individual item. Isn't it ridiculous? Couldn't it be done in a more systematic way? Seriously, think about all other issues relating to this.

    Another example:

    American Journal of Sociology ---> 340 items
    American journal of sociology ---> 157 items

    My goal is the following:

    American journal of sociology ---> American Journal of Sociology

    Is it possible to that manually?

    All that I wanted to say is Zotero must have the feature that users can globally change an attribute that has been assigned to several items.

    To the best of my knowledge, in Papers 3 you have a separate view to see the items in your library by "Authors" and "Publication Title". In that case, if you make any change to a publication title, then changes will be applied to all other "nested items"
  • You can batch edit fields like that using the JavaScript API:
  • Thanks @bwiernik

    I appreciate, since I didn't know the feature. Besides, I'm not an advanced user and hope to be able to that.

    By the way, I'll backup before every changes. But, for normal users, it would be easier to do that in a simpler manner.

    Many thanks again.
  • A more user friendly approach to batch editing is planned but no ETA is available. There are examples for batch editing shown at the bottom of the page I linked to that you can adapt for your needs.
Sign In or Register to comment.