ProCite to Zotero Conversion: Translator, RIS, and Testing

maevepotter · March 26, 2012

Hello,

I have 1002 affected records in that need editing before I can import the RIS file into Zotero. I have described the problem below. I will pay for someone to do this, if anyone is interested. I am also open to suggestions for how to solve this problem without editing records individually. However, I think that this will be required due to the non standard language (editor, editors, edited by..), and the non-unique field designations (A1, A2, N1)

Thanks, see below:

I need someone to go through the RIS output from my procite database, and change certain fields so that it will import correctly into Zotero.

As far as I can tell from a search, there are 1,002 records that need to be changed. Each are different, and it will be difficult to write a script to fix it. They will likely need to be changed individually.

In the records below, you will see A1 which stands for author, and below it you will see an N1, which will detail the author's role. In the first example, author role is editor. In order for this to transfer correctly to Zotero, A1 needs to be changed to ED, and the author role field can be deleted. Note that N1 field are not always for author role. See example 1 father down, where A1 is notes. This is an important field and can not be deleted. This is why I caution against scripting. The fields are not unique.

There are also records that have an author and an editor, see example three below.

For these records, you would leave the first A1 alone, as that is the author of the chapter in the book, but then scan down and see that there are two A2 authors, with the role of editor. Therefore those A2 fields should be changed from A2 to ED, so that Sabloff and Lamberg will be transferred into Zotero as Editors.

You can tell that the bibliographic record should have a main author for the selection when there is an author and title field first, followed by the field "N1- Connective Phrase: In: " After that in, you will see the A2 fields, followed by the author role N1 field for editor. The A2's should be changed to ED as described above, and the "N1- Connective Phrase:In" should be deleted.

Please note that this is careful work, but very important to enable a clean transfer to Zotero and not interfere with my ability to do future citations correctly. The problem is that when it is transferred into Zotero without ED being noted, an editor becomes an author, and the author role field goes to notes, and you can no longer tell which one is the editor.

Please let me know if you have questions. I can send the RIS file via email to anyone who is interested.

Example Records:

TY - CHAP
A1 - Abel, Annie H.
N1 - Author Role: editor
T2 - Chardon's Journal at Fort Clark, 1834-1839
CY - Pierre
PB - South Dakota State Department of History
PY - 1932
N1 - Notes: seen
KW - Arikara
KW - ethnohistory
KW - fur trade
KW - Upper Missouri
ER -

TY - CHAP
A1 - Abel, Annie H.
N1 - Author Role: editor
T2 - Tabeau's Narrative of Loisel's Expedition to the Upper Missouri
CY - Norman
PB - University of Oklahoma Press
PY - 1939
N1 - Notes: have, seen
KW - Arikara
KW - ethnohistory
KW - Upper Missouri
ER -

TY - CHAP
A1 - Adams, Robert McC.
T1 - The Emerging Place of Trade in Civilizational Studies
N1 - Connective Phrase: In
A2 - Sabloff, Jeremy
A2 - Lamberg-Karlovsky, C.
N1 - Author Role: edited by
T2 - Ancient Subsistence and Trade
CY - Albuquerque
PB - University of New Mexico Press
PY - 1975
SP - 451-465
KW - trade
ER -

mronkko2 · March 26, 2012

I am fairly sure that this could be solved with some regex scripts.

If you can post the entire file online (e.g https://gist.github.com/), I could take a look if this can be done easily.

adamsmith · March 26, 2012

I have no interest in manually editing this, but I'm wondering if this couldn't be solved with 10-20 lines of smart code in a custom version of the translator.
So to get this clear:
1. The content of the N1 field is standardized? I.e. it would only ever be Author Role: edited by or Author Role: editor?
2. The actual rules are
a) If A1 exists and N1 has "Author Role: editor" --> Convert A1 into editor(s)
2b) If A2 and A1 exist and N1 has "Author Role: edited by" --> Convert A2 into editor(s)

Is that right?

maevepotter · March 26, 2012

Here is the file Mronkko2:

git://gist.github.com/2206031.git gist: 2206031

Edit: possibly easier to see here: https://raw.github.com/gist/2206031/1d3e4be1b6b0fadd0ef86f6567f11a97d50c86f7/March26ProciteRISdata

Adam:

1. No, N1 is not standardized, it can be author role, notes, connective phrase etc.

2. the rules sound almost correct, but sometimes there is only an A1 who is the editor of a volume, with no other author. Other times Author role can equal: "editor" "edited" or "edited by" -- that is not standard, because the person entering in the record simply types editor, editors etc.

That's why this one is tricky.. not sure how to solve it. I also have absolutely no patience for editing all of this manually, I'm hoping there is a better way.

Thanks.

adamsmith · March 26, 2012

What's supposed to happen with authors labeled as "by" and "collected by"?

mronkko2 · March 26, 2012

I extracted all the N1 that seemed to contain Author info and placed them here

https://docs.google.com/spreadsheet/ccc?key=0ApjTwt-m_3BJdHZtRFczSmhXSW1GMlZ0SUVUNFREb0E

If you can indicate which ones should be matched as "Author" or "Editor", then replacing these with a script should not be difficult.

There are 87 unique values, of which most clearly are not author roles.

maevepotter · March 26, 2012

Here is a Gist with only the records with editor, editors, edited by:
git://gist.github.com/2206131.git

Adam:

Aw geez, I didn't even see those.

In the case of the "Arikara Creation Myth" record #157: collected by could likely safely be changed to editor or contributor for Sloan, Elizabeth C. See record for reference below:

TY - GEN
N1 - Record ID: 157
A1 - Bear, Stella
T1 - Arikara Creation Myth
A2 - Sloan, Elizabeth C.
N1 - Author Role: Collected by
PY - n.d.
N1 - Extent of Work: 2
N1 - Packaging Method: pages, handwritten
AV - National Anthropological Archives, Smithsonian Institution, Washington, D.C.
UR - Ms. on file
N1 - Notes: May have been published in JAFL, date unknown
KW - Arikara
KW - ethnohistory
ER -

As for authors labeled by like the record below: Roger, J. Daniel is the monographic author of T2 - Spiro Archaeology: 1980 Research and the A1 authors are the authors of the T1- A Magnetic Survey in the Plaza Area of the Spiro Mounds Site... which is a chapter in the book. I guess the only difference here is that instead of editing the volume that a book chapter is in, this is a case where the book is written by a author. Does that make sense? In other words, Rogers is the author of the volume, and the volume contains contributions by other authors. However, that specific chapter is the one being cited, not the whole volume, so Bennett and Weymouth would have to be listed first. Otherwise it would follow the edited by format.It looks like it would be the same with other records like, record numbers 332, 861.

TY - CHAP
N1 - Record ID: 262
A1 - Bennett, Connie
A1 - Weymouth, John
T1 - A Magnetic Survey in the Plaza Area of the Spiro Mounds Site
A2 - Rogers, J. Daniel
N1 - Author Role: by
T2 - Spiro Archaeology: 1980 Research
CY - Norman
PB - Oklahoma Archaeological Survey
PY - 1982
SP - 215-226
T3 - Studies in Oklahoma's Past
N1 - Series Volume ID: 9
KW - Oklahoma
KW - Spiro
ER -

However, records like this one:
TY - CHAP
N1 - Record ID: 2150
A1 - Wagner, Henry R.
N1 - Author Role: editor
A2 - Camp, Charles L.
N1 - Author Role: by
T2 - The Plains and the Rockies: A Bibliography of Original Narratives of Travel and Adventure, 1800-1865
VL - 3rd revised
CY - Columbus, Ohio
PB - Long's College Book Company
PY - 1953
KW - Plains
KW - ethnohistory
ER -

Don't seem to need anything much done, other than "author role: by" deleted, and A1 changed to ED, and then "author role: editor" deleted.

maevepotter · March 26, 2012

mronkko2:

I see your google document.

The problem with the N1 field is that if it does have random info, like author affiliation in it, I would like that to just dump to the notes like it already seems to when you import an N1 field into Zotero. So would we just leave those ones alone in a script, and let them transfer to notes?

When I see oddball things, should I write N1 in the map to box, or notes? And if editor: ED?

Otherwise if the fields should stay A1 should I do nothing?

maevepotter · March 26, 2012

Also what are the codes for Contributor, Translator, and Reviewed Author?

adamsmith · March 26, 2012

I would suggest to leave the field empty if N1 is to become a note and the authors stay the same.
In all other cases specify what needs to happen (i.e. - change author to editor & delete N1; delete/discard N1 w/o changes to author etc.). I figure most of the examples in the spreadsheet fall in the first category.
@mronkko - I'm going to leave this to you then - if there are any problems that you think would be better handled in translator than by editing the RIS let me know.

mronkko · March 26, 2012

The problem with the N1 field is that if it does have random info, like author affiliation in it, I would like that to just dump to the notes like it already seems to when you import an N1 field into Zotero. So would we just leave those ones alone in a script, and let them transfer to notes?

Just mark the ones you want converted to be recorded as editor with ED in the second column. I will then use that info to convert the file. You can leave the rest blank.

maevepotter · March 26, 2012

mronkko:
what are the codes for Contributor and Translator? If one of the fields is editor and translator should I write both codes?

mronkko · March 26, 2012

Just mark them as CO and TR, and you can use two codes. Just separate them with space.

maevepotter · March 26, 2012

Ok I think that should be good now. Let me know if you can see my changes.

maevepotter · March 26, 2012

Also, just making sure, a code like "N1 - Author Role: editors" mapped to ED.. .will that map the author associated with it to ED? or just that field?

mronkko · March 26, 2012

A2 - Fox, J. W.
N1 - Author Role: Editors

Will be mapped as

ED - Fox, J. W.

A1 - Cleaves, Francis W.
N1 - Author Role: translator and editor

Will be mapped as
ED - Cleaves, Francis W.
TR - Cleaves, Francis W.

(The ED and TR are then easy to replace with the correct codes later)

It is quite late here in Europe, so I will most likely have to leave this for tomorrow

maevepotter · March 26, 2012

That looks great. Thank you so much for doing this. It is such a huge help.

mronkko · March 26, 2012

Here is the file after replacing the codes shown in Google docs

https://raw.github.com/gist/2208155/ca32fb8af48546a82524b2144e5cd5c62bea43e2/gistfile1.txt

Contributor and editor map to A2

http://www.zotero.org/support/kb/field_mappings

How should this be mapped?

TY - CHAP
N1 - Record ID: 2648
A1 - Van Gennep, A.
A2 - Vizedom, M. B.
A2 - Coffee, G. L.
N1 - Author Role: translated by
T2 - The Rites of Passage
N1 - Author, Subsidiary: Kimball S. T.
N1 - Author Role: with an introduction by
CY - Chicago
PB - University of Chicago Press
PY - 1960
KW - anthropological theory
ER -

Can the N1 that do not follow A1 be ignored? (In this case these would be mapped to A2 anyway)

If so, you can find the end result here

https://raw.github.com/gist/2208549/513c8a0073d2634efad1fb625354b7ba42ac168b/gistfile1.txt

maevepotter · March 26, 2012

It should be mapped like this:

TY - CHAP
N1 - Record ID: 2648
A1 - Van Gennep, A.
TR - Vizedom, M. B.
TR - Coffee, G. L.
T2 - The Rites of Passage
CO - Author, Subsidiary: Kimball S. T.
CY - Chicago
PB - University of Chicago Press
PY - 1960
KW - anthropological theory
ER -

ie
A2 - Vizedom, M. B.
A2 - Coffee, G. L.
N1 - Author Role: translated by

changes to TR TR and delete that N1 note.

but below that in

N1 - Author, Subsidiary: Kimball S. T.
N1 - Author Role: with an introduction by

its basically nonsense, but that n1 can be mapped to CO with the author role n1 being deleted.

I am not sure if "all N1 that do not follow A1" can be ignored. Though if at least it did go into N1 for notes that would be ok.

If Contributor and editor map to A2, will Zotero not pick one or the other on the dropdown box in Zotero Standalone? Will I need to change that manually? It needs to be clear in Zotero standalone that these people are editors of a volume. Will that work?

Thanks again for your diligence!

adamsmith · March 26, 2012

@mronkko - editors should map to ED to distinguish them from contributors - that's how Zotero exports RIS.

maevepotter · March 26, 2012

Also there may be a problem when there is more than one editor.

from the mapping you posted to github above(see example below) Both A2's should have been made into ED. Lamberg did, but not Sabloff. That might be hard to fix. Also, I think all N1's that say "Connective Phrase: In" can be deleted if that is easy to do. It ends up making a junk note in Zotero. I think it is a leftover from Procite when it needed to know that something was a chapter "in" the following book in the record.

TY - CHAP
N1 - Record ID: 7
A1 - Adams, Robert McC.
T1 - The Emerging Place of Trade in Civilizational Studies
N1 - Connective Phrase: In
A2 - Sabloff, Jeremy
ED - Lamberg-Karlovsky, C.
T2 - Ancient Subsistence and Trade
CY - Albuquerque
PB - University of New Mexico Press
PY - 1975
SP - 451-465
KW - trade
ER -

aurimas · March 26, 2012

Shouldn't this sort of parsing be included in the RIS translator? I remember this same issue being brought up on the forums before and it does seem that it should affect several users. Furthermore, looking at the rules presented above, it doesn't look like they would interfere with normal RIS parsing. At the very least, I think this should be made into a modified RIS translator.

adamsmith · March 26, 2012

My approach would have been to go through a translator change rather than regexing this, too, and yes, it'd be nice to have this for more people.

The problem is that the actual N1 field is hand-inserted by the user, which makes this quite tricky/impossible to do reliably (it also makes you wonder what on earth the procite people were thinking, but, oh well).
So I don't think we should try to hack this into the RIS translator.
I think having a modified version avaiable for ProCite users, though, would be great - there is, I think, a "migrating from endnote" page which links to specific scripts, and I don't think there is any reason not to have such a site for procite on the wiki.

aurimas · March 26, 2012

The problem is that the actual N1 field is hand-inserted by the user, which makes this quite tricky/impossible to do reliably (it also makes you wonder what on earth the procite people were thinking, but, oh well).

I'm not sure I understand what you mean by "hand-inserted by the user". I understand that the N1 fields may actually be notes entered by the user, but the N1 fields we're interested (i.e. "Author Role", etc.) appear to be inserted by ProCite and follow fairly strict grammar that, in my opinion, could be safely parsed.

Besides the user having to figure out which translator to use for import, I don't actually see much downside to just having a modified RIS import translator, so what you propose with the link on a wiki page sounds fine to me.

adamsmith · March 27, 2012

appear to be inserted by ProCite and follow fairly strict grammar that, in my opinion, could be safely parsed.

If you look at them in mronkko's google doc, I doubt that. The "Author Role" part is strict syntax, but what follows seems to me to be entirely at the whims of the user - e.g. there are about 15 different ways an editor is designated within the library - and that's the library of one user!.

aurimas · March 27, 2012

Never mind, you're absolutely right. The roles do appear to be input by the user.

I wonder, however, if this is mandatory or does ProCite provide some controlled vocabulary that the user can use by default. I've never used ProCite and I can't find any videos or images of "Author Role" fields, so perhaps someone else can answer this. We could at least provide a translator for the controlled vocabulary.

mronkko · March 27, 2012

@mronkko - editors should map to ED to distinguish them from contributors - that's how Zotero exports RIS.

I added the ED field to the documentation: http://www.zotero.org/support/kb/field_mappings

It seems that it is not possible to distinguish between translator, contributor, and series editor on RIS import or export because these all use the A2 field. So these need to be fixed manually after the data has been imported to Zotero.

Fixing a RIS file to use ED and A2 based on N1 is not particularly difficult, but I do not see a way to make this robust enough to be a part of the translator.

maevepotter · March 27, 2012

Here are some screenshots of the Procite input screens for various types of citations:

Book Section
http://dl.dropbox.com/u/19141190/Procite Book Section Screenshot.png

Journal article
http://dl.dropbox.com/u/19141190/Procite Journal Article Screenshot.png

Book Whole
http://dl.dropbox.com/u/19141190/Procite Book Whole Screenshot.png

Report
http://dl.dropbox.com/u/19141190/Procite Report Screenshot.png

So where did we land on my problem? @mronkko are you still working on the regex file? (See my last post on some records with two editors, only having mapped to one of the editors.)

As far as the translator, I am unclear on how that works exactly, but I do think something needs to be in place for Procite users to make the switch. The person I am working for has a very large library, and as a professional academic who publishes 3 or more papers a year, he needs it to preserve the data in his citations with fidelity from Procite. Otherwise the barriers to switching for normal Procite users will be very high.

Would it be helpful to have a list of all 45 fields Procite uses to help the developers map them to a solid Zotero transfer?

Thanks very much for all of your help and ideas.

maevepotter · March 27, 2012

Page 161 on in Procite manual lists the fields Procite uses. http://www.procite.com/support/docs/ProCite 5 Manual.pdf

adamsmith · March 27, 2012

@mronkko - With this info, I'd be inclined to say that changing this in the translator is a much better option than going through the RIS, no? If the Procite output is richer than what real RIS can do it would be good to take advantage of that.*

This is very doable, but doing it well/thoroughly will take a good amount of work and care. Since some money seems to be available, I'd suggest whoever does this to charge a token amount. I'm going to yield to mronkko or Aurimas on this. If I were going to do it I'd ask for 50-100US$ - that's obviously a bargain price for any coding work - I'd undercharge because the result would also be more widely useful.

*maevepotter - to clarify the difference: mronkko's strategy so far is to modify the RIS file so that it conforms to standards and is imported correctly. Aurimas and I suggest to modify the translator that does the importing, which would allow us to import things (like translator, series editor) that exist in Zotero but not in RIS specifications.

mronkko · March 27, 2012

I was writing a longer post, but accidentally deleted it. Fixing the RIS file will not help, the translator needs to be fixed. Pages 164-169 of the manual document the fields and I think that it is sufficient info to solve this.

Also, I have no experience in translator development and have my hands full of work trying to get the next version of ZotPad out, so cannot help at this point with the translator coding.