Get a count of authors in your library
I've made some software to help with my own research, and I wanted to get some feedback from the community about it. I want to eventually present a poster about it, but only after I'm reasonable sure it's actually useful.
Here's a link, let me know what you think: https://colab.research.google.com/drive/10yDfHxMO6Se0PFSQMJk7gcEePOYbGVpu
The motivation for this software is research. The way I go about research is attempt to find authors to read and then read most of their research within the past 5 years; this helps me to develop a literature review. One way of finding authors to read is to look for multiple mentions of that authors work while reading. This process is more qualitative than I would like, so instead I would like to look at the frequency of which authors appear in my library as authoring works I find to be interesting. Zotero certainly makes this simpler than looking at the raw papers, but it can be difficult to match authors between papers and can end up being a terribly manual and time consuming process. In addition, not all author names from papers are exactly the same. For example:
Kristen E. DiCerbo', 'Kristen DiCerbo', 'Kristen E. Dicerbo
Robert Mislevy', 'Robert J. Mislevy
Ryan Baker', 'Ryan S Baker', 'Ryan S. Baker
Tiffany Barnes', 'Tiffany M. Barnes
Drew Hicks', 'Andrew G. Hicks
etc.
All of these similar names can be understood by a human to be referring to the same author, but attempting to search for exact matches of these names would necessarily lead to missing some of the author's works'.
The proposed solution is to attempt to compare author names to each other such that the names are approximately equal, then track how many works have been authored by that person. Thus my software, which does precisely that.
Given that the process looks for approximately equal names, there is some failure rate, but I believe that similarity testing between the names is robust. In addition the program can handle a functionally infinite list of authors for comparison. It will also return the results of its inspection in a standard format for subsequent processing; e.g. it returns a CSV file of the author names and the times they are seen in data submitted.
Already this tool has helped my own research, as I was aware of many authors that came up commonly in my research; but several others I had not seen as they are commonly second (or later) authors, rather than the eye-catching first author position. For example, here is a truncated list of authors from my own library:
Author | Count
---|---
Steven L. Wise | 11
Kristen E. DiCerbo', 'Kristen DiCerbo', 'Kristen E. Dicerbo | 7
Robert Mislevy', 'Robert J. Mislevy | 5
Ryan Baker', 'Ryan S Baker', 'Ryan S. Baker | 5
Michael Eagle | 5
V. Elizabeth Owen | 5
Alina A. von Davier', 'Alina von Davier | 4
Tiffany Barnes', 'Tiffany M. Barnes | 4
Sebastiaan de Klerk | 3
Zhan Shu | 3
Elizabeth Rowe | 3
Drew Hicks', 'Andrew G. Hicks | 3
While I was aware that Dr. Wise was important to my research, as he has many first-authorships; I had not taken a close look at the work of Dr. Tiffany Barnes, Dr. Zhan Shu, or Dr. Drew Hicks.
The software, in its current form, does have some drawbacks. It only accepts input in the form of Zotero reports (with plans for input in the form of: Bibtex, RIS, and CSV). It also is currently running as a Jupyter notebook on Google Colab; to expose it to a wider audience I would want to host the software on something more professional and responsive, but Colab serves as a decent test-bed for this initial alpha release.
All that being said, I hope the software will be useful to others; and I welcome your feedback. Thank you.
Here's a link, let me know what you think: https://colab.research.google.com/drive/10yDfHxMO6Se0PFSQMJk7gcEePOYbGVpu
The motivation for this software is research. The way I go about research is attempt to find authors to read and then read most of their research within the past 5 years; this helps me to develop a literature review. One way of finding authors to read is to look for multiple mentions of that authors work while reading. This process is more qualitative than I would like, so instead I would like to look at the frequency of which authors appear in my library as authoring works I find to be interesting. Zotero certainly makes this simpler than looking at the raw papers, but it can be difficult to match authors between papers and can end up being a terribly manual and time consuming process. In addition, not all author names from papers are exactly the same. For example:
Kristen E. DiCerbo', 'Kristen DiCerbo', 'Kristen E. Dicerbo
Robert Mislevy', 'Robert J. Mislevy
Ryan Baker', 'Ryan S Baker', 'Ryan S. Baker
Tiffany Barnes', 'Tiffany M. Barnes
Drew Hicks', 'Andrew G. Hicks
etc.
All of these similar names can be understood by a human to be referring to the same author, but attempting to search for exact matches of these names would necessarily lead to missing some of the author's works'.
The proposed solution is to attempt to compare author names to each other such that the names are approximately equal, then track how many works have been authored by that person. Thus my software, which does precisely that.
Given that the process looks for approximately equal names, there is some failure rate, but I believe that similarity testing between the names is robust. In addition the program can handle a functionally infinite list of authors for comparison. It will also return the results of its inspection in a standard format for subsequent processing; e.g. it returns a CSV file of the author names and the times they are seen in data submitted.
Already this tool has helped my own research, as I was aware of many authors that came up commonly in my research; but several others I had not seen as they are commonly second (or later) authors, rather than the eye-catching first author position. For example, here is a truncated list of authors from my own library:
Author | Count
---|---
Steven L. Wise | 11
Kristen E. DiCerbo', 'Kristen DiCerbo', 'Kristen E. Dicerbo | 7
Robert Mislevy', 'Robert J. Mislevy | 5
Ryan Baker', 'Ryan S Baker', 'Ryan S. Baker | 5
Michael Eagle | 5
V. Elizabeth Owen | 5
Alina A. von Davier', 'Alina von Davier | 4
Tiffany Barnes', 'Tiffany M. Barnes | 4
Sebastiaan de Klerk | 3
Zhan Shu | 3
Elizabeth Rowe | 3
Drew Hicks', 'Andrew G. Hicks | 3
While I was aware that Dr. Wise was important to my research, as he has many first-authorships; I had not taken a close look at the work of Dr. Tiffany Barnes, Dr. Zhan Shu, or Dr. Drew Hicks.
The software, in its current form, does have some drawbacks. It only accepts input in the form of Zotero reports (with plans for input in the form of: Bibtex, RIS, and CSV). It also is currently running as a Jupyter notebook on Google Colab; to expose it to a wider audience I would want to host the software on something more professional and responsive, but Colab serves as a decent test-bed for this initial alpha release.
All that being said, I hope the software will be useful to others; and I welcome your feedback. Thank you.
var rows = await Zotero.DB.queryAsync("SELECT TRIM(firstName || ' ' || lastName) AS author, COUNT(*) AS num FROM items JOIN itemCreators USING (itemID) JOIN creators USING (creatorID) WHERE libraryID=1 GROUP BY author ORDER BY COUNT(*) DESC");
var lines = [];
for (let row of rows) {
lines.push(row.author + '\t' + row.num);
}
return lines.join('\n');
This will count author occurrences in your personal library (libraryID 1).
The above code won't handle name variations, but
Zotero.Utilities.levenshtein(a, b)
is available and could be used for that purpose with a little bit more work.How would it be re-coded to apply to just (1) the current collection or (2) selected items (in a collection or library) ?
var CurrentCollection = Zotero.getActiveZoteroPane().getSelectedCollection();
var sql = "SELECT TRIM(lastName || ', ' || firstName) AS author, COUNT(*) AS num FROM items JOIN itemCreators USING (itemID) JOIN creators USING (creatorID) JOIN collectionItems USING (itemID) JOIN collections USING (collectionID) WHERE collectionName=? GROUP BY author ORDER BY COUNT(*) DESC";
var rows = await Zotero.DB.queryAsync(sql,CurrentCollection.name);
var lines = [];
for (let row of rows) {
lines.push(row.author + '\t' + row.num);
}
return lines.join('\n');
As previously noted, to get the item counts for each author listed alphabetically by lastName instead of numerically by count, replace ORDER BY COUNT(*) DESC with ORDER BY lastName ASC.
If not, could anyone give me some pointers to the SQL schema so I can work out what fields I need to SELECT and how to adapt the query?
(More broadly, these kind of simple 'metrics' about one's library would be useful functionality to have within Zotero itself - I vaguely recall that Papers used to have this in v2 back in 2012 - but I understand that there are limited resources available)
https://forums.zotero.org/discussion/comment/412566/#Comment_412566
I've taken a look at how to combine the code in that link with the code above, so it runs on whichever collection is currently selected, but this appears to be more complex than I thought at first glance. From what I can see, the code above joins multiple tables (items, itemCreators, creators, collectionItems, collections) to match collectionName using WHERE with the value passed through for the selected collection. By contrast, the code in the other thread by @AbeJellinek uses WHERE fieldID=${Zotero.ItemFields.getID('publicationTitle')}
I've tried combining these approaches, but quickly got myself tied up in knots. I'll try again when I have a clear head, but I suspect this is a lot simpler to people who know the Zotero schema and are more adept with SQL than me!