Merging and Cleaning Author lists, Publishers, etc. to prevent overzealous disambiguation
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
I am concerned by this and similar discussions concerning author names. I think that there are larger issues here than disambiguation of author names in a database and using the the disambiguated name for a citation in a manuscript.
For example, if an author's name is Able Brown Word but has publications under "A Word" and "AB Word" should the "A Word" publications be cited "AB Word"? Even though we may know that this is the same author, should a citation list a name that differs from that on the published document? What of a publication with several authors with all author names published as "single initial last_name"? Should the first author be given a more complete name to facilitate automated disambiguation? Certainly, we do not want to edit our database to do the opposite -- omit a part of an author's name so that all versions in the database are at the same level of specificity.
I will stop my way off topic digression.
I ran the query in SQLiteStudio but I am sure other SQLite client tools can run it as well. If it runs too slowly, try removing the four "upper()" functions.
The query may be helpful if you just want to find out possible author name disambiguation issues and correct them manually. It sorts by author's last name so you can go to My Library in Zotero, sort by Creator or use the Advanced Search function and follow the query result to make your amendments.
EDIT: Updated the query below. It now also takes into account the second and third character of the first name to minimize false negatives.
SELECT cD1.lastName,
cD1.firstName,
cD1.shortName,
(SELECT cD2.lastName || " " || cD2.firstName
FROM creatorData cD2, itemCreators iC2, creators c2
WHERE iC2.itemID=i.itemID AND
iC2.creatorID=c2.creatorID AND
cD2.creatorDataID=c2.creatorDataID AND
iC2.orderIndex=0
) as firstCreator,
cT.creatorType || " of a " || ifnull("" ||
(SELECT substr(iDV.value, 1, 4)
FROM itemData iD, itemDataValues iDV, fields f
WHERE iD.itemID=i.itemID AND iDV.valueID=iD.valueID AND
iD.fieldID=f.fieldID AND f.fieldName="date"
), "") ||
" " || iT.typeName || ": " || ifnull("" ||
(SELECT iDV.value
FROM itemData iD, itemDataValues iDV, fields f
WHERE iD.itemID=i.itemID AND iDV.valueID=iD.valueID AND
iD.fieldID=f.fieldID AND f.fieldName="title"
), "") as "Participated in Title",
(SELECT count(*)
FROM creatorData cD2, itemCreators iC2, creators c2
WHERE cD2.creatorDataID!=c.creatorDataID AND
cD2.creatorDataID=c2.creatorDataID AND
iC2.creatorID=c2.creatorID AND
upper(cD2.lastName)=upper(cD1.lastName) AND
upper(substr(cD2.firstName,1,1))=upper(substr(cD1.firstName,1,1)) AND
(upper(substr(cD2.firstName,2,1)) IN (upper(substr(cD1.firstName,2,1)), "", " ", ".") OR
substr(cD1.firstName,2,1) IN ("", " ", ".")
) AND
(upper(substr(cD2.firstName,3,1)) IN (upper(substr(cD1.firstName,3,1)), "", " ", ".") OR
substr(cD1.firstName,3,1) IN ("", " ", ".") OR
substr(cD2.firstName,2,1) IN ("", " ", ".") OR substr(cD1.firstName,2,1) IN ("", " ", ".")
)
) as alikeItems
FROM creatorData cD1,
creators c,
itemCreators iC,
creatorTypes cT,
itemTypes iT,
items i
WHERE alikeItems > 0 AND
c.creatorDataID=cD1.creatorDataID AND
iC.creatorID=c.creatorID AND
cT.creatorTypeID=iC.creatorTypeID AND
i.itemID=iC.itemID AND
iT.itemTypeID=i.itemTypeID
ORDER BY cD1.lastName, alikeItems DESC, cD1.firstName
It would be great if such a function would be included in Zotero, as I'm planning to export my data to Drupal Biblio for web presentation (this is possible as Bibtex or by the "import from Zotero" add-on to Drupal Biblio) and have to perform the author merge there. As there is (still) no two-way sync between Drupal Biblio and Zotero my Zotero database stays the same.
Another way would be to switch to Mendeley, but NO! I'd like to stay with Zotero...