Merging and Cleaning Author lists, Publishers, etc. to prevent overzealous disambiguation

2»
  • Not directly a Zotero issue but a problem (at least a puzzle)

    I am concerned by this and similar discussions concerning author names. I think that there are larger issues here than disambiguation of author names in a database and using the the disambiguated name for a citation in a manuscript.

    For example, if an author's name is Able Brown Word but has publications under "A Word" and "AB Word" should the "A Word" publications be cited "AB Word"? Even though we may know that this is the same author, should a citation list a name that differs from that on the published document? What of a publication with several authors with all author names published as "single initial last_name"? Should the first author be given a more complete name to facilitate automated disambiguation? Certainly, we do not want to edit our database to do the opposite -- omit a part of an author's name so that all versions in the database are at the same level of specificity.
  • You might be interested in http://about.orcid.org/. Any solution that only uses names will always be limited in solving ambiguity.
  • Thanks, Rintze. ORCID is certainly interesting. Implementation will likely take years. Look at ResearcherID. I think people and institutions resist ResearcherID for a number of reasons beyond the connection with Thompson. Maybe if publishers begin to mandate author clarity we will finally begin to solve this.

    I will stop my way off topic digression.
  • edited October 9, 2012
    I have been facing the author name cleansing issue for some time now and drafted a SQL query that provides a list of duplicate-looking author names with the related Zotero items. The query looks for the same last names and initial character of the first name. It also provides some additional fields to make it easier to locate the respective items in Zotero.

    I ran the query in SQLiteStudio but I am sure other SQLite client tools can run it as well. If it runs too slowly, try removing the four "upper()" functions.

    The query may be helpful if you just want to find out possible author name disambiguation issues and correct them manually. It sorts by author's last name so you can go to My Library in Zotero, sort by Creator or use the Advanced Search function and follow the query result to make your amendments.

    EDIT: Updated the query below. It now also takes into account the second and third character of the first name to minimize false negatives.


    SELECT cD1.lastName,
    cD1.firstName,
    cD1.shortName,
    (SELECT cD2.lastName || " " || cD2.firstName
    FROM creatorData cD2, itemCreators iC2, creators c2
    WHERE iC2.itemID=i.itemID AND
    iC2.creatorID=c2.creatorID AND
    cD2.creatorDataID=c2.creatorDataID AND
    iC2.orderIndex=0
    ) as firstCreator,
    cT.creatorType || " of a " || ifnull("" ||
    (SELECT substr(iDV.value, 1, 4)
    FROM itemData iD, itemDataValues iDV, fields f
    WHERE iD.itemID=i.itemID AND iDV.valueID=iD.valueID AND
    iD.fieldID=f.fieldID AND f.fieldName="date"
    ), "") ||
    " " || iT.typeName || ": " || ifnull("" ||
    (SELECT iDV.value
    FROM itemData iD, itemDataValues iDV, fields f
    WHERE iD.itemID=i.itemID AND iDV.valueID=iD.valueID AND
    iD.fieldID=f.fieldID AND f.fieldName="title"
    ), "") as "Participated in Title",
    (SELECT count(*)
    FROM creatorData cD2, itemCreators iC2, creators c2
    WHERE cD2.creatorDataID!=c.creatorDataID AND
    cD2.creatorDataID=c2.creatorDataID AND
    iC2.creatorID=c2.creatorID AND
    upper(cD2.lastName)=upper(cD1.lastName) AND
    upper(substr(cD2.firstName,1,1))=upper(substr(cD1.firstName,1,1)) AND
    (upper(substr(cD2.firstName,2,1)) IN (upper(substr(cD1.firstName,2,1)), "", " ", ".") OR
    substr(cD1.firstName,2,1) IN ("", " ", ".")
    ) AND
    (upper(substr(cD2.firstName,3,1)) IN (upper(substr(cD1.firstName,3,1)), "", " ", ".") OR
    substr(cD1.firstName,3,1) IN ("", " ", ".") OR
    substr(cD2.firstName,2,1) IN ("", " ", ".") OR substr(cD1.firstName,2,1) IN ("", " ", ".")
    )
    ) as alikeItems
    FROM creatorData cD1,
    creators c,
    itemCreators iC,
    creatorTypes cT,
    itemTypes iT,
    items i
    WHERE alikeItems > 0 AND
    c.creatorDataID=cD1.creatorDataID AND
    iC.creatorID=c.creatorID AND
    cT.creatorTypeID=iC.creatorTypeID AND
    i.itemID=iC.itemID AND
    iT.itemTypeID=i.itemTypeID
    ORDER BY cD1.lastName, alikeItems DESC, cD1.firstName
  • I also miss a function for merging author names, as for example in Drupal Biblio, where there is a merge function in the "authors list" that covered all these needs (basically with the aim to have unambiguous "author pages" for each author in Drupal Biblio), except different names for the same author, until the "aka" field was included. See this thread: http://drupal.org/node/409670

    It would be great if such a function would be included in Zotero, as I'm planning to export my data to Drupal Biblio for web presentation (this is possible as Bibtex or by the "import from Zotero" add-on to Drupal Biblio) and have to perform the author merge there. As there is (still) no two-way sync between Drupal Biblio and Zotero my Zotero database stays the same.

    Another way would be to switch to Mendeley, but NO! I'd like to stay with Zotero...
  • Why not export from Drupal Biblio using RIS and then import the file back into Zotero ?
Sign In or Register to comment.