[Suggestion] Problems with Disambiguate by First Name
I love Zotero and have used it for more than a decade.
However, I have a suggestion regarding disambiguation by first name. Unfortunately, meta-data sources are inconsistent in how they provide information. As an example, when I download articles, my own name might be filled in as "Hudson, N. W.", "Hudson, Nathan W.", or other variants.
This leads to a problem with the default APA Zotero style sheet, which has "disambiguate by first name" enabled. I'll start to get errors such as Zotero citing (Hudson, 2021), (N. W. Hudson, 2021), and (Nathan W. Hudson, 2021) in papers to disambiguate them. Naturally, I don't have time to go through my Zotero database of thousands of articles to make sure every author's name is listed correctly.
Thus, every time a new APA style sheet is released, I have to modify it to turn off disambiguation by first name.
However, this is not particularly ideal if there are truly two different authors with the same last name.
Thus, I'm wondering whether it might be better/smarter for Zotero to only disambiguate by first name ONLY IF the initials don't match. Thus, Zotero should assume that "N. W. Hudson" and "Nathan W. Hudson" are the same person. This seems like it would cause Zotero to use the correct behavior in the VAST majority of cases (e.g., where meta-data is entered incorrectly or differently across articles). It seems like it would cause problems in only a fringe number of unusual cases (e.g., where authors share the exact same initials but different names).
However, I have a suggestion regarding disambiguation by first name. Unfortunately, meta-data sources are inconsistent in how they provide information. As an example, when I download articles, my own name might be filled in as "Hudson, N. W.", "Hudson, Nathan W.", or other variants.
This leads to a problem with the default APA Zotero style sheet, which has "disambiguate by first name" enabled. I'll start to get errors such as Zotero citing (Hudson, 2021), (N. W. Hudson, 2021), and (Nathan W. Hudson, 2021) in papers to disambiguate them. Naturally, I don't have time to go through my Zotero database of thousands of articles to make sure every author's name is listed correctly.
Thus, every time a new APA style sheet is released, I have to modify it to turn off disambiguation by first name.
However, this is not particularly ideal if there are truly two different authors with the same last name.
Thus, I'm wondering whether it might be better/smarter for Zotero to only disambiguate by first name ONLY IF the initials don't match. Thus, Zotero should assume that "N. W. Hudson" and "Nathan W. Hudson" are the same person. This seems like it would cause Zotero to use the correct behavior in the VAST majority of cases (e.g., where meta-data is entered incorrectly or differently across articles). It seems like it would cause problems in only a fringe number of unusual cases (e.g., where authors share the exact same initials but different names).
We won't do the same for Chicago style, though, as there has to be a way to get this right and Chicago wants disambiguation by full first name if initials are the same.
Please don't have Zotero and APA-7 style "assume" that N. W. Hudson and Nathan W. Hudson are the same person. That isn't how I interpret the rule. If necessary, I can provide hundreds of real examples of authors with identical initials who are different people.
My own name David Williams Lawrence -- there are at least 3 other DW Lawrence authors in my database: David Wyndham Lawrence, David Wilson Lawrence, and two other David W Lawrence authors who don't provide their full names but have papers of _very_ different topics from one another and different from any of the full-name David W Lawrences. Then there is Duncan W. Lawrence ...
Zotero has long recommended that, when there are different versions (different completeness) of author names, names be edited so that all have the same complete version.
@nate.hudson As a regular practice, you should ensure that an author in your library has their name spelled consistently across all items. I recommend always entering the names fully.
I do understand your point about database maintenance. Unfortunately, I have literally thousands of articles in my database. I correct obvious errors (e.g., titles, page numbers). But remember that the average user is going to do what's easiest. Especially given that even full reference lists use only initials in APA style (e.g., I'm always listed as Hudson, N. W.), there's little incentive to go through my database, find all instances of my own name (much less other authors' names) and correct them (and this doesn't touch on the fact that if Smith, J. K. is imported into Zotero, there's essentially zero change that I or other users are going to look up the author's full name to figure out what the initials should be). It's the classic battle between what UX designers wish users would do, and how users actually use the product.
My recommendation isn’t to try to clean an entire database at once. Rather, as you import items, take a second to check the data after import (eg, complete author names, put the title into sentence case, etc). Zotero does a great job at this a lot of the time, but it is good to take the few seconds and correct any issue when they happen.
Then, only update existing items when you run into a surprising citation as you are writing—correct the two items to make the author names consistent.
I'm grateful that you're aware of changes in how APA7 handles citations that that you're correcting it to a form that will work for psychological researchers without further modifications.
As I've said before, most Zotero users likely don't have the technological know-how to edit Style Sheets. And even if they do, they're likely not motivated to do so. Zotero needs to "just work" for most people to adopt it. I'm so glad that you're actively working on changing the APA7 stylesheet to reflect how actual researchers and journals expect citations to be used! Thank you!
@nate.hudson @bwiernik
Particularly, Nate: You are lucky that essentially all of your articles include your first name and middle initial in PubMed, PsycINFO, and my database, SafetyLit. There are other NW Hudson-named authors who have published in the behavioral or psychology field in the years you published with whom might not want to be confused.
Having a few thousand articles and several thousand author names is probably too much for one person to fix retroactively but you have graduate students and possibly staff who could work on this if your budget will allow. One of my first well-paying full-time indoor jobs was doing exactly that, editing Reference Manager records in a professor's database in 1983-1985 (under CP/M --before there was DOS). Tracking down authors full-names was detective work that satisfied my OCD tendencies. My boss was clearly more OCD than I because she valued the idea of full author names. This required seeking out printed university catalogs and foundation annual reports because the Internet and the World Wide Web were only a dream at that time.
Carrying on my OCD and speaking of thousands of articles, SafetyLit currently has 660 thousand journal articles and several million authors. We have volunteers who use a tool to disambiguate author names by comparing publication dates, co-authors, subject matter, and author biographies and their institutional affiliation.
By the way, there 15 Nathan W. Hudson articles and 9 Brenton M. Wiernik articles in the database. Brenton, your "green period" articles didn't fit the inclusion criteria.