Cleanup names of creators, publishers, cities, etc.?
I'm attempting to 'proofread' my whole library (4000+ entries currently), and hoping to standardize the formatting for the same entities. I'm not sure what the best (least time-consuming) strategy to do this is, and I'm wondering if there are any automated tools within Zotero (or ways to search the database) that could help.
Examples of entities to standardize:
1) Author names:
--Sometimes they have an initial, other times full first name.
--Sometimes the middle name/initial is included, other times it is omitted.
Ideally, I'd like to have the same name for the author in every instance so that sorting works out consistently. But for even more general consistency I'd like to also standardize this for editors, etc.
(An alternative philosophy would be to keep the names exactly as printed in each reference, but that would lead to inconsistency in, e.g., multiple publications in the same year by the same author.)
2) Publishers
Publishers often have abbreviations, and I get lots of different results when inserting via ISBN and other databases. For example, you might get: Penguin / Penguin Inc. / Penguin Incorporated / etc.
3) Locations:
Cities often had the state or country added, and the way these are abbreviated (or what information they have) is often inconsistent. It also might be worth standardizing the distinction between for example 'Cambridge, MA, USA' and 'Cambridge, England'.
Others include Journal titles (e.g., which words are capitalized, "and" vs. &, etc.), journal abbreviations, series titles, etc. I have a few journals/recurring conference proceedings that have changed their names slightly over the years but can be easily cited by the same name (as for author names above, this might vary based on your citation philosophy).
Out of all of those things, it looks like only "Publisher" and FIRST "Creator" can be sorted automatically in Zotero which would allow me to see a list. For the others, like Location, it would be great to be able to sort by them (or otherwise generate a list), and then standardize them manually from there. The trickiest and most important would be to be able to see ALL Creators (not just first) and make sure they're formatted the same way across entries.
I'm not sure how much time/effort I want to invest in going through all of my entries by hand. And it doesn't need to be perfect in the end. But having some tools or methods to search through and check these things would be great, so I can fix the most glaring inconsistencies.
Has anyone tried anything similar? Do you have any suggestions?
I suppose one way to approach this would be to create a custom style that sorts by the relevant fields, such as Location, but I wouldn't know how to do that for multiple authors and/or editors, etc. It would also be complicated to then go back through from that information and edit the relevant items in Zotero.
It seems that Zotero knows something about recurring names/titles, because they autofill. Is that information accessible anywhere?
Thanks!!
Examples of entities to standardize:
1) Author names:
--Sometimes they have an initial, other times full first name.
--Sometimes the middle name/initial is included, other times it is omitted.
Ideally, I'd like to have the same name for the author in every instance so that sorting works out consistently. But for even more general consistency I'd like to also standardize this for editors, etc.
(An alternative philosophy would be to keep the names exactly as printed in each reference, but that would lead to inconsistency in, e.g., multiple publications in the same year by the same author.)
2) Publishers
Publishers often have abbreviations, and I get lots of different results when inserting via ISBN and other databases. For example, you might get: Penguin / Penguin Inc. / Penguin Incorporated / etc.
3) Locations:
Cities often had the state or country added, and the way these are abbreviated (or what information they have) is often inconsistent. It also might be worth standardizing the distinction between for example 'Cambridge, MA, USA' and 'Cambridge, England'.
Others include Journal titles (e.g., which words are capitalized, "and" vs. &, etc.), journal abbreviations, series titles, etc. I have a few journals/recurring conference proceedings that have changed their names slightly over the years but can be easily cited by the same name (as for author names above, this might vary based on your citation philosophy).
Out of all of those things, it looks like only "Publisher" and FIRST "Creator" can be sorted automatically in Zotero which would allow me to see a list. For the others, like Location, it would be great to be able to sort by them (or otherwise generate a list), and then standardize them manually from there. The trickiest and most important would be to be able to see ALL Creators (not just first) and make sure they're formatted the same way across entries.
I'm not sure how much time/effort I want to invest in going through all of my entries by hand. And it doesn't need to be perfect in the end. But having some tools or methods to search through and check these things would be great, so I can fix the most glaring inconsistencies.
Has anyone tried anything similar? Do you have any suggestions?
I suppose one way to approach this would be to create a custom style that sorts by the relevant fields, such as Location, but I wouldn't know how to do that for multiple authors and/or editors, etc. It would also be complicated to then go back through from that information and edit the relevant items in Zotero.
It seems that Zotero knows something about recurring names/titles, because they autofill. Is that information accessible anywhere?
Thanks!!
Unfortunately python isn't an accessible option for me at this point.
I suppose that the priority is for first authors to be the same (for sorting purposes) so I could just focus on that and hope the rest is relatively consistent.
Edit: just FYI for anyone who comes across this wanting a full answer, the CSV approach did work for almost everything I needed, except to view non-standard-formatting dates, for which I had to actually inspect the database: https://forums.zotero.org/discussion/comment/314145/#Comment_314145 -- that worked but was slow and difficult. It's an option if you need it though.
I'm not sure how different journals do it, but if Mary Smith gets married and changes her name to Mary Jones, I'd like citation to stay Mary Smith since that is what is on the printed copy of the journal article (or whatever...). However I would like to link them someway in Zotero to say that that Mary Smith and Mary Jones were the same person. Maybe a Zotero display field and a citation name field. If Zotero name field is blank, then the citation display name is used. Disk space is cheap these days.
This is also a problem for transgender authors. Yes, they do exist...
But if a name is changed by marriage, as a pen name, or whatever, I'd default to citing them as different people. Sometimes I will use brackets if needed to clarify. In my field (Linguistics) I have noticed some inconsistency with this, such as citing an old paper from a now-married author with the new, well-known name, ignoring that it wasn't published that way, making it confusing and hard to find. So to me, the brackets can help with that, which is where I also put original names if I'm transliterating them from another alphabet. It's a somewhat arbitrary and messy process, but I'm not sure how it could be effectively automated.
> User could manually ,or by batch edit, add "Display Names" for "Citation Names"
> If "Display Name" field is blank, then Zotero would use the "Citation Name" by default for views on the screen.
So imports would be done to citation field but for Zotero's screen displays the Display Name.
The problem that this doesn't address is how to do bibliographic outputs automagically. I have no idea if any of the styles are setup to handle this.
The issue of how to cite pseudonyms comes to the requirements of the style standard you will use:
There is a useful entry in the APA Style Blog concerning the "cite what you see" philosophy:
https://blog.apastyle.org/apastyle/2012/02/how-to-cite-pseudonyms.html
This post also discusses the single-field Dr. Seuss vs the firstname lastname Theodor Geisel, and the Dalai Lama / Tenzin Gyatso issue.
The pseudonym citation rules within other standards (Chicago, MLA, etc.) can differ especially when the pseudonym is for an author who wishes to remain anonymous.
There are a couple of Zotero Forum posts about this:
https://forums.zotero.org/discussion/55251/citing-pseudonyms
https://forums.zotero.org/discussion/19610/pseudonyms