ProCite to Zotero Conversion: Translator, RIS, and Testing
Hello,
I have 1002 affected records in that need editing before I can import the RIS file into Zotero. I have described the problem below. I will pay for someone to do this, if anyone is interested. I am also open to suggestions for how to solve this problem without editing records individually. However, I think that this will be required due to the non standard language (editor, editors, edited by..), and the non-unique field designations (A1, A2, N1)
Thanks, see below:
I need someone to go through the RIS output from my procite database, and change certain fields so that it will import correctly into Zotero.
As far as I can tell from a search, there are 1,002 records that need to be changed. Each are different, and it will be difficult to write a script to fix it. They will likely need to be changed individually.
In the records below, you will see A1 which stands for author, and below it you will see an N1, which will detail the author's role. In the first example, author role is editor. In order for this to transfer correctly to Zotero, A1 needs to be changed to ED, and the author role field can be deleted. Note that N1 field are not always for author role. See example 1 father down, where A1 is notes. This is an important field and can not be deleted. This is why I caution against scripting. The fields are not unique.
There are also records that have an author and an editor, see example three below.
For these records, you would leave the first A1 alone, as that is the author of the chapter in the book, but then scan down and see that there are two A2 authors, with the role of editor. Therefore those A2 fields should be changed from A2 to ED, so that Sabloff and Lamberg will be transferred into Zotero as Editors.
You can tell that the bibliographic record should have a main author for the selection when there is an author and title field first, followed by the field "N1- Connective Phrase: In: " After that in, you will see the A2 fields, followed by the author role N1 field for editor. The A2's should be changed to ED as described above, and the "N1- Connective Phrase:In" should be deleted.
Please note that this is careful work, but very important to enable a clean transfer to Zotero and not interfere with my ability to do future citations correctly. The problem is that when it is transferred into Zotero without ED being noted, an editor becomes an author, and the author role field goes to notes, and you can no longer tell which one is the editor.
Please let me know if you have questions. I can send the RIS file via email to anyone who is interested.
Example Records:
TY - CHAP
A1 - Abel, Annie H.
N1 - Author Role: editor
T2 - Chardon's Journal at Fort Clark, 1834-1839
CY - Pierre
PB - South Dakota State Department of History
PY - 1932
N1 - Notes: seen
KW - Arikara
KW - ethnohistory
KW - fur trade
KW - Upper Missouri
ER -
TY - CHAP
A1 - Abel, Annie H.
N1 - Author Role: editor
T2 - Tabeau's Narrative of Loisel's Expedition to the Upper Missouri
CY - Norman
PB - University of Oklahoma Press
PY - 1939
N1 - Notes: have, seen
KW - Arikara
KW - ethnohistory
KW - Upper Missouri
ER -
TY - CHAP
A1 - Adams, Robert McC.
T1 - The Emerging Place of Trade in Civilizational Studies
N1 - Connective Phrase: In
A2 - Sabloff, Jeremy
A2 - Lamberg-Karlovsky, C.
N1 - Author Role: edited by
T2 - Ancient Subsistence and Trade
CY - Albuquerque
PB - University of New Mexico Press
PY - 1975
SP - 451-465
KW - trade
ER -
I have 1002 affected records in that need editing before I can import the RIS file into Zotero. I have described the problem below. I will pay for someone to do this, if anyone is interested. I am also open to suggestions for how to solve this problem without editing records individually. However, I think that this will be required due to the non standard language (editor, editors, edited by..), and the non-unique field designations (A1, A2, N1)
Thanks, see below:
I need someone to go through the RIS output from my procite database, and change certain fields so that it will import correctly into Zotero.
As far as I can tell from a search, there are 1,002 records that need to be changed. Each are different, and it will be difficult to write a script to fix it. They will likely need to be changed individually.
In the records below, you will see A1 which stands for author, and below it you will see an N1, which will detail the author's role. In the first example, author role is editor. In order for this to transfer correctly to Zotero, A1 needs to be changed to ED, and the author role field can be deleted. Note that N1 field are not always for author role. See example 1 father down, where A1 is notes. This is an important field and can not be deleted. This is why I caution against scripting. The fields are not unique.
There are also records that have an author and an editor, see example three below.
For these records, you would leave the first A1 alone, as that is the author of the chapter in the book, but then scan down and see that there are two A2 authors, with the role of editor. Therefore those A2 fields should be changed from A2 to ED, so that Sabloff and Lamberg will be transferred into Zotero as Editors.
You can tell that the bibliographic record should have a main author for the selection when there is an author and title field first, followed by the field "N1- Connective Phrase: In: " After that in, you will see the A2 fields, followed by the author role N1 field for editor. The A2's should be changed to ED as described above, and the "N1- Connective Phrase:In" should be deleted.
Please note that this is careful work, but very important to enable a clean transfer to Zotero and not interfere with my ability to do future citations correctly. The problem is that when it is transferred into Zotero without ED being noted, an editor becomes an author, and the author role field goes to notes, and you can no longer tell which one is the editor.
Please let me know if you have questions. I can send the RIS file via email to anyone who is interested.
Example Records:
TY - CHAP
A1 - Abel, Annie H.
N1 - Author Role: editor
T2 - Chardon's Journal at Fort Clark, 1834-1839
CY - Pierre
PB - South Dakota State Department of History
PY - 1932
N1 - Notes: seen
KW - Arikara
KW - ethnohistory
KW - fur trade
KW - Upper Missouri
ER -
TY - CHAP
A1 - Abel, Annie H.
N1 - Author Role: editor
T2 - Tabeau's Narrative of Loisel's Expedition to the Upper Missouri
CY - Norman
PB - University of Oklahoma Press
PY - 1939
N1 - Notes: have, seen
KW - Arikara
KW - ethnohistory
KW - Upper Missouri
ER -
TY - CHAP
A1 - Adams, Robert McC.
T1 - The Emerging Place of Trade in Civilizational Studies
N1 - Connective Phrase: In
A2 - Sabloff, Jeremy
A2 - Lamberg-Karlovsky, C.
N1 - Author Role: edited by
T2 - Ancient Subsistence and Trade
CY - Albuquerque
PB - University of New Mexico Press
PY - 1975
SP - 451-465
KW - trade
ER -
If you can post the entire file online (e.g https://gist.github.com/), I could take a look if this can be done easily.
So to get this clear:
1. The content of the N1 field is standardized? I.e. it would only ever be Author Role: edited by or Author Role: editor?
2. The actual rules are
a) If A1 exists and N1 has "Author Role: editor" --> Convert A1 into editor(s)
2b) If A2 and A1 exist and N1 has "Author Role: edited by" --> Convert A2 into editor(s)
Is that right?
git://gist.github.com/2206031.git gist: 2206031
Edit: possibly easier to see here: https://raw.github.com/gist/2206031/1d3e4be1b6b0fadd0ef86f6567f11a97d50c86f7/March26ProciteRISdata
Adam:
1. No, N1 is not standardized, it can be author role, notes, connective phrase etc.
2. the rules sound almost correct, but sometimes there is only an A1 who is the editor of a volume, with no other author. Other times Author role can equal: "editor" "edited" or "edited by" -- that is not standard, because the person entering in the record simply types editor, editors etc.
That's why this one is tricky.. not sure how to solve it. I also have absolutely no patience for editing all of this manually, I'm hoping there is a better way.
Thanks.
https://docs.google.com/spreadsheet/ccc?key=0ApjTwt-m_3BJdHZtRFczSmhXSW1GMlZ0SUVUNFREb0E
If you can indicate which ones should be matched as "Author" or "Editor", then replacing these with a script should not be difficult.
There are 87 unique values, of which most clearly are not author roles.
git://gist.github.com/2206131.git
Adam:
Aw geez, I didn't even see those.
In the case of the "Arikara Creation Myth" record #157: collected by could likely safely be changed to editor or contributor for Sloan, Elizabeth C. See record for reference below:
TY - GEN
N1 - Record ID: 157
A1 - Bear, Stella
T1 - Arikara Creation Myth
A2 - Sloan, Elizabeth C.
N1 - Author Role: Collected by
PY - n.d.
N1 - Extent of Work: 2
N1 - Packaging Method: pages, handwritten
AV - National Anthropological Archives, Smithsonian Institution, Washington, D.C.
UR - Ms. on file
N1 - Notes: May have been published in JAFL, date unknown
KW - Arikara
KW - ethnohistory
ER -
As for authors labeled by like the record below: Roger, J. Daniel is the monographic author of T2 - Spiro Archaeology: 1980 Research and the A1 authors are the authors of the T1- A Magnetic Survey in the Plaza Area of the Spiro Mounds Site... which is a chapter in the book. I guess the only difference here is that instead of editing the volume that a book chapter is in, this is a case where the book is written by a author. Does that make sense? In other words, Rogers is the author of the volume, and the volume contains contributions by other authors. However, that specific chapter is the one being cited, not the whole volume, so Bennett and Weymouth would have to be listed first. Otherwise it would follow the edited by format.It looks like it would be the same with other records like, record numbers 332, 861.
TY - CHAP
N1 - Record ID: 262
A1 - Bennett, Connie
A1 - Weymouth, John
T1 - A Magnetic Survey in the Plaza Area of the Spiro Mounds Site
A2 - Rogers, J. Daniel
N1 - Author Role: by
T2 - Spiro Archaeology: 1980 Research
CY - Norman
PB - Oklahoma Archaeological Survey
PY - 1982
SP - 215-226
T3 - Studies in Oklahoma's Past
N1 - Series Volume ID: 9
KW - Oklahoma
KW - Spiro
ER -
However, records like this one:
TY - CHAP
N1 - Record ID: 2150
A1 - Wagner, Henry R.
N1 - Author Role: editor
A2 - Camp, Charles L.
N1 - Author Role: by
T2 - The Plains and the Rockies: A Bibliography of Original Narratives of Travel and Adventure, 1800-1865
VL - 3rd revised
CY - Columbus, Ohio
PB - Long's College Book Company
PY - 1953
KW - Plains
KW - ethnohistory
ER -
Don't seem to need anything much done, other than "author role: by" deleted, and A1 changed to ED, and then "author role: editor" deleted.
I see your google document.
The problem with the N1 field is that if it does have random info, like author affiliation in it, I would like that to just dump to the notes like it already seems to when you import an N1 field into Zotero. So would we just leave those ones alone in a script, and let them transfer to notes?
When I see oddball things, should I write N1 in the map to box, or notes? And if editor: ED?
Otherwise if the fields should stay A1 should I do nothing?
In all other cases specify what needs to happen (i.e. - change author to editor & delete N1; delete/discard N1 w/o changes to author etc.). I figure most of the examples in the spreadsheet fall in the first category.
@mronkko - I'm going to leave this to you then - if there are any problems that you think would be better handled in translator than by editing the RIS let me know.
what are the codes for Contributor and Translator? If one of the fields is editor and translator should I write both codes?
N1 - Author Role: Editors
Will be mapped as
ED - Fox, J. W.
A1 - Cleaves, Francis W.
N1 - Author Role: translator and editor
Will be mapped as
ED - Cleaves, Francis W.
TR - Cleaves, Francis W.
(The ED and TR are then easy to replace with the correct codes later)
It is quite late here in Europe, so I will most likely have to leave this for tomorrow
https://raw.github.com/gist/2208155/ca32fb8af48546a82524b2144e5cd5c62bea43e2/gistfile1.txt
Contributor and editor map to A2
http://www.zotero.org/support/kb/field_mappings
How should this be mapped?
TY - CHAP
N1 - Record ID: 2648
A1 - Van Gennep, A.
A2 - Vizedom, M. B.
A2 - Coffee, G. L.
N1 - Author Role: translated by
T2 - The Rites of Passage
N1 - Author, Subsidiary: Kimball S. T.
N1 - Author Role: with an introduction by
CY - Chicago
PB - University of Chicago Press
PY - 1960
KW - anthropological theory
ER -
Can the N1 that do not follow A1 be ignored? (In this case these would be mapped to A2 anyway)
If so, you can find the end result here
https://raw.github.com/gist/2208549/513c8a0073d2634efad1fb625354b7ba42ac168b/gistfile1.txt
TY - CHAP
N1 - Record ID: 2648
A1 - Van Gennep, A.
TR - Vizedom, M. B.
TR - Coffee, G. L.
T2 - The Rites of Passage
CO - Author, Subsidiary: Kimball S. T.
CY - Chicago
PB - University of Chicago Press
PY - 1960
KW - anthropological theory
ER -
ie
A2 - Vizedom, M. B.
A2 - Coffee, G. L.
N1 - Author Role: translated by
changes to TR TR and delete that N1 note.
but below that in
N1 - Author, Subsidiary: Kimball S. T.
N1 - Author Role: with an introduction by
its basically nonsense, but that n1 can be mapped to CO with the author role n1 being deleted.
I am not sure if "all N1 that do not follow A1" can be ignored. Though if at least it did go into N1 for notes that would be ok.
If Contributor and editor map to A2, will Zotero not pick one or the other on the dropdown box in Zotero Standalone? Will I need to change that manually? It needs to be clear in Zotero standalone that these people are editors of a volume. Will that work?
Thanks again for your diligence!
from the mapping you posted to github above(see example below) Both A2's should have been made into ED. Lamberg did, but not Sabloff. That might be hard to fix. Also, I think all N1's that say "Connective Phrase: In" can be deleted if that is easy to do. It ends up making a junk note in Zotero. I think it is a leftover from Procite when it needed to know that something was a chapter "in" the following book in the record.
TY - CHAP
N1 - Record ID: 7
A1 - Adams, Robert McC.
T1 - The Emerging Place of Trade in Civilizational Studies
N1 - Connective Phrase: In
A2 - Sabloff, Jeremy
ED - Lamberg-Karlovsky, C.
T2 - Ancient Subsistence and Trade
CY - Albuquerque
PB - University of New Mexico Press
PY - 1975
SP - 451-465
KW - trade
ER -
The problem is that the actual N1 field is hand-inserted by the user, which makes this quite tricky/impossible to do reliably (it also makes you wonder what on earth the procite people were thinking, but, oh well).
So I don't think we should try to hack this into the RIS translator.
I think having a modified version avaiable for ProCite users, though, would be great - there is, I think, a "migrating from endnote" page which links to specific scripts, and I don't think there is any reason not to have such a site for procite on the wiki.
Besides the user having to figure out which translator to use for import, I don't actually see much downside to just having a modified RIS import translator, so what you propose with the link on a wiki page sounds fine to me.
I wonder, however, if this is mandatory or does ProCite provide some controlled vocabulary that the user can use by default. I've never used ProCite and I can't find any videos or images of "Author Role" fields, so perhaps someone else can answer this. We could at least provide a translator for the controlled vocabulary.
It seems that it is not possible to distinguish between translator, contributor, and series editor on RIS import or export because these all use the A2 field. So these need to be fixed manually after the data has been imported to Zotero.
Fixing a RIS file to use ED and A2 based on N1 is not particularly difficult, but I do not see a way to make this robust enough to be a part of the translator.
Book Section
http://dl.dropbox.com/u/19141190/Procite Book Section Screenshot.png
Journal article
http://dl.dropbox.com/u/19141190/Procite Journal Article Screenshot.png
Book Whole
http://dl.dropbox.com/u/19141190/Procite Book Whole Screenshot.png
Report
http://dl.dropbox.com/u/19141190/Procite Report Screenshot.png
So where did we land on my problem? @mronkko are you still working on the regex file? (See my last post on some records with two editors, only having mapped to one of the editors.)
As far as the translator, I am unclear on how that works exactly, but I do think something needs to be in place for Procite users to make the switch. The person I am working for has a very large library, and as a professional academic who publishes 3 or more papers a year, he needs it to preserve the data in his citations with fidelity from Procite. Otherwise the barriers to switching for normal Procite users will be very high.
Would it be helpful to have a list of all 45 fields Procite uses to help the developers map them to a solid Zotero transfer?
Thanks very much for all of your help and ideas.
This is very doable, but doing it well/thoroughly will take a good amount of work and care. Since some money seems to be available, I'd suggest whoever does this to charge a token amount. I'm going to yield to mronkko or Aurimas on this. If I were going to do it I'd ask for 50-100US$ - that's obviously a bargain price for any coding work - I'd undercharge because the result would also be more widely useful.
*maevepotter - to clarify the difference: mronkko's strategy so far is to modify the RIS file so that it conforms to standards and is imported correctly. Aurimas and I suggest to modify the translator that does the importing, which would allow us to import things (like translator, series editor) that exist in Zotero but not in RIS specifications.
Also, I have no experience in translator development and have my hands full of work trying to get the next version of ZotPad out, so cannot help at this point with the translator coding.