Migrating from EndNote + BibTex: limitations & workarounds?

jawhiteley · December 21, 2011

I want to use Zotero. I really do. Sadly, I have this big fat EndNote database that I've been using relatively successfully in conjunction with BibTeX, and I've run into a few problems. If I know I can solve these, I'll be happy to make a full transition, but if not, it might be more practical to stick with EndNote :(

Missing fields when importing EndNote data

I used a few custom fields in EndNote that I need to keep in the transition. Specifically:

Label I use this field for BibTeX citation keys, as well as stable temporary references in Word.
I know others may use this for "tags", but either way, I can not seem to find this in citations imported to Zotero.
Research Notes not 'Notes', the other one, with all my own notes and annotations of papers.
Custom fields. I used a custom field for 'tags', dynamic collections, etc. and would like transfer all this information.

What is the easiest way to get all this into Zotero?

Edit EndNote's export filter to assign values to fields that Zotero understands?
Edit the exported RIS file manually, or with a script?
- e.g. http://forums.zotero.org/discussion/2015/importing-endnotes-label-field/
Edit Zotero's import filter to extract and put data where I want it?
cringe ... all of the above?

Citation Keys (BibTeX, etc.)

I know this is an old issue, and I think it's been very well discussed in the following, so I will just add my support for a better solution:

I noticed several references to exposing a citation key field being "planned", but the ticket seems to be 4 years old. Is this field now accessible in the current versions (2.1.10)? I can't seem to find it.

It's great that Zotero does support BibTeX export, but I dislike the idea of having to adapt the way I work to use its auto-generated keys, rather than ones which I have already taken the time to build, some manually. The fact that I can not easily control the citation keys in BibTeX files exported from Zotero makes me reluctant to make the switch.

Fortunately, BibTeX is a plain-text format, which makes it rather easy to write scripts to handle a lot of batch and automated processing. Unfortunately, it means another step in a workflow, which only happens "after the fact" of exporting, and a pain to have to use on every export. I'm sure there are tools that can 'monitor' a folder, and run scripts automatically when such files are modified, but that is yet another tool to have to set-up and maintain.

Currently, I can use Zotero with BibTeX, but I'd have to essentially do the following to get a result that is comfortable for me to work with:

Use Zotero to collect & manage citations and pdf files :D
Export data to BibTeX format for use with LaTeX
Use a script to clean-up / manage the exported BibTeX file & citation keys
- Coincidentally, here is one that was written for Papers, but could likely be adapted to work with other reference management software, such as Zotero: http://www.bulheller.com/bibtexformat.html
Use a BibTeX-native program, such as BibDesk or JabRef to manage the BibTeX file and get the citation keys into my LaTeX input, via 'push', copy-paste, etc.

I don't really mind having separate BibTeX files to manage with each writing project, but the big challenge I'm currently facing is finding a suitable place to store citation keys within Zotero itself, which will also get exported to BibTeX. In theory, I figured I could store the citation keys in a 'temporary field', like a Note. As long as I can get that data into the BibTex Export, I can use a script to set the citation keys to the contents of that other field. Can I do this within an export translator instead? Is there currently a suitable field I could use to store this type of 'custom' data?

All these workarounds can make Zotero 'useable' with BibTeX, but it would be so much easier with the following:

ability to view / edit citation keys (or "natural language identifier") within Zotero, which are also used in BibTeX export
modify the format used for automatically-generated citation keys (without having to edit source code)
- BibDesk, for Mac OS X, has a nice, powerful system for doing this, for example
generate automatic citation keys on-demand, or only when necessary, without replacing manually-edited keys
- OR: scriptable editing to allow users to build their own field contents. This is how I currently manage citation keys in EndNote.

Batch editing

http://forums.zotero.org/discussion/13563/batch-operations/

The lack of batch editing is also a bit of a deal-breaker for me. I use EndNote's batch editing, or AppleScript for really fancy tasks, to ensure consistency across my database for various things, including citation keys, keywords, etc. The 'term lists' of possible values for fields within EndNote are also very useful for this. Other than adding a tag to several records at once, I don't see much support for this kind of thing in Zotero :( What if I add the wrong tag? I can remove a single tag from the entire library, but not several tags at once, or a tag from many but not all related records.

I am guessing that Zotero uses an SQLite database to store all the data, and I think I've seen a schema for it. Am I right in thinking that if I knew SQL, I could theoretically write my own batch-editing scripts? Not the safest or most user-friendly solution, I admit, but how practical would this be?

Some positive feedback

I love the fact that Zotero is open-source, gratis-free, and cross-platform. I love the syncing features and web-interface. I love the Citation Style Language support. I like having multiple rich-text notes, but I'd be concerned about how these get exported other than being concatenated together. The fact that Zotero is a "Firefox plug-in" used to be a turn-off, but now I think of it as "Zotero has a built-in, extensible web-browser" ;-) I haven't tried the standalone versions yet. The ability to snag citations from the web and other documents looks very cool, but haven't tried it much yet. The social-networking features also sound interesting, but local database management features are honestly more important to me for the moment.

adamsmith · December 21, 2011

I don't work with bibtex, so I'll leave this to someone else, but:

"am I right in thinking that if I knew SQL, I could theoretically write my own batch-editing scripts? Not the safest or most user-friendly solution, I admit, but how practical would this be?"

I agree batch editing should be high on the list of priorities for Zotero devs - to me this is really the only feature where Zotero has a clear disadvantage to the "old-guard" of ref-management software.

But yes, it's possible to batch-edit the sqlite directly - it's unsafe and not supported, so back-up super frequently and incrementally, but it's still possible.
Alternatively, Zotero has both a local javascript API and a read/write API on the server that could be used more safely - there are some already existing applications (pyzotero most notably). questions on that should go to the zotero dev list:
http://groups.google.com/group/zotero-dev

jawhiteley · December 21, 2011

Thanks for the tip about the API, I hadn't seen that before. Still more 'low-level' than I'd prefer, but easier and safer than editing the sql directly. Another reason to learn python! I guess I'd need to see a few simple examples to see how easy it would be to adapt to suit my needs, but I'll direct that to the dev list when I'm ready.

Thanks!

ajlyon · December 21, 2011

On the batch editing front, I use Pyzotero (http://pypi.python.org/pypi/Pyzotero/0.9.1) when I need to some sort of batch edit. Its modifications are applied to the server and then synced locally, but it works very well.

It is in Python, but that should be fairly straightforward to pick up, if you're used to doing things like this in AppleScript.

Don't touch the Sqlite directly. It's much safer to access the data locally using JavaScript in Firefox's JavaScript scratchpad, or on the server using Pyzotero. Those two approaches cannot corrupt your data. Messing with the database directly can.

As for EndNote conversion workflows, the problem is likely that EndNote isn't mapping those fields anywhere, or it's mapping them to somewhere non-standard. Can you provide an example entry exported from EndNote that includes such fields?

Rintze · December 21, 2011

I've linked to this before, but an IMHO extremely powerful way to support batch editing would be via Google Refine (http://code.google.com/p/google-refine/). I don't know how complicated it would be to access and update Zotero libraries though (Google Refine seems to be able to handle RDF XML).

jawhiteley · December 21, 2011

@ajlyon,

Regarding EndNote conversion, here's a sample entry:
TY - JOUR AB - Symbiotic cyanobacteria—bryophyte associations on the forest floor are shown to contribute significantly to stand-level nitrogen budgets (snipped for brevity)... AU - Lindo, Zoë AU - Whiteley, Jonathan C3 - PhD (McGill) C4 - electronic CA - DO - 10.1007/s11104-010-0678-6 KW - Bryophyte Bryosphere Epiphytic Nitrogen fixation Old-growth forests Symbiotic cyanobacteria L1 - internal-pdf://Lindo-Whiteley (2011) Plant Soil-1266541056/Lindo-Whiteley (2011) Plant Soil.pdf LB - Lindo-Whiteley:2011.PlantSoil N1 - Discovery News: [Moss: Breakfast of Champions](http://news.discovery.com/earth/moss-breakfast-of-champions-110224.html) Wood Focus Magazine 2011-09-02: [Mossing over the Issue](http://www.iom3.org/news/mossing-over-issue) PY - 2011 RN - First collaboration with Zoë! She did most of the work; I provided methods and materials for N-fixation measurements, and helped design sampling. SN - 0032-079X SP - 1-8 ST - Old trees contribute bio-available nitrogen through canopy bryophytes T2 - Plant and Soil TI - Old trees contribute bio-available nitrogen through canopy bryophytes UR - http://dx.doi.org/10.1007/s11104-010-0678-6 ID - 1290 ER -

'C3' & 'C4' are custom fields, with the 'location' (electronic, paper, etc.), and 'context' (basically, collection tags). I can live without the former, but the latter is pretty important to me. These fields were added by modifying the RIS export style in EndNote, so that explains why those weren't imported. How can I convert them into tags of some sort in Zotero?

'LB' is the 'label' field in EndNote: these become citation keys on export to BibTeX, and are also used for "temporary citations" in Word (Record numbers are less stable, especially when collaborating across more than one library).

'RN' is 'Research Notes'. 'Notes' are assigned to 'N1', but typically consist of 'extra metadata' from online repositories, or other types of metadata, rather than actual notes to myself about the content.

Ideally, what I'd like is to convert some of my custom fields into tags, and turn 'Research Notes' into a separate child note. I'm not sure what to do with the label, however, but I'd like to put it somewhere sensible. The easiest solution is probably to just modify the EndNote export style mash potential tags into the 'KW' field (Keywords become Tags in Zotero), and put the rest into 'N1' to be imported as a single Note. Any better ideas? Suggestions for what to do with the 'label'?

Zotero also only imports the file link, 'L1', but seems to ignore the actual URL field, 'UR', even though URLs in Zotero are exported to 'UR'. What's up with that?

Incidentally, what's the best way to deal with multiple keywords? EndNote just puts one field marker, with each entry on a separate line, but Zotero just treat this as one big tag.

Rintze · December 21, 2011

Incidentally, what's the best way to deal with multiple keywords?

EndNote should prepend the "KW - " tag before each individual keyword, according to, ahem, their own spec: http://www.refman.com/support/risformat_tags_04.asp

ajlyon · December 21, 2011

The contents of the UR field are imported as a link attachment to the item in Zotero.

I recommend switching around the export tags as such:

TY - JOUR
AB - Symbiotic cyanobacteria—bryophyte associations on the forest floor are shown to contribute significantly to stand-level nitrogen budgets (snipped for brevity)...
AU - Lindo, Zoë
AU - Whiteley, Jonathan
KW - PhD (McGill)
C4 - electronic
CA -
DO - 10.1007/s11104-010-0678-6
KW - Bryophyte
Bryosphere
Epiphytic
Nitrogen fixation
Old-growth forests
Symbiotic cyanobacteria
L1 - internal-pdf://Lindo-Whiteley (2011) Plant Soil-1266541056/Lindo-Whiteley (2011) Plant Soil.pdf
M1 - Lindo-Whiteley:2011.PlantSoil
N1 - Discovery News: [Moss: Breakfast of Champions](http://news.discovery.com/earth/moss-breakfast-of-champions-110224.html)
Wood Focus Magazine 2011-09-02: [Mossing over the Issue](http://www.iom3.org/news/mossing-over-issue)
PY - 2011
N1 - First collaboration with Zoë!
She did most of the work; I provided methods and materials for N-fixation measurements, and helped design sampling.
SN - 0032-079X
SP - 1-8
ST - Old trees contribute bio-available nitrogen through canopy bryophytes
T2 - Plant and Soil
TI - Old trees contribute bio-available nitrogen through canopy bryophytes
UR - http://dx.doi.org/10.1007/s11104-010-0678-6
ID - 1290
ER -

The keywords import fine as tags, and I put the C3 as a tag as well. The label is now in Zotero's "extra" field, and is generally going to the only thing in that field. I just made RN into N1, so Zotero imports it as a child note as well.

And, Rintze, we take into account EndNote's violations of the spec on RIS import, so newline-delimited keywords are split appropriately. Much to the dismay of everyone who wanted tags with newlines inside them.

jawhiteley · December 21, 2011

And, Rintze, we take into account EndNote's violations of the spec on RIS import, so newline-delimited keywords are split appropriately.

Um, not in my experience. When I import the above reference, I end up with one big tag of all terms, delimited with spaces. I'm using Zotero v 2.1.10 in Firefox Mac 5.0.1 (Mac OS X 10.6.8). Items on separate lines with their own 'KW' in front are imported as separate tags, however. Is there a simple way to ensure separate tags are imported ... separately?

Regarding URLs, I did happen to notice this thread, which points out that Zotero will populate the URL field with 'UR' or 'L1', whichever comes first in the entry. Is this still the case? Not a big deal, if it just means rearranging the field order in the export style, but I'm just curious.

ajlyon · December 21, 2011

The KW handling has been in the code for years, and it's working for me. Can you email me (avram@gimranov.com) a file with some references that don't work? Perhaps there's something funny going on with the line endings.

As for UR, yes, it will be put in the URL field if it is first in the RIS record.

ajlyon · December 21, 2011

Ok. It's not an issue in the file. Perhaps a platform-specific issue? Does this work for anyone else?

adamsmith · December 21, 2011

I get the keywords mangled together as one tag in 2.1.10 - can try in 3.0 later if that might be the issue.

dstillman · December 21, 2011

It does seem to be 2.1 vs. 3.0. Not sure why—this was fixed three years ago on the 1.0 branch, so this must have been a regression at some point. In any case, it's working for 3.0.

jawhiteley · December 22, 2011

Wherever the problem is, it doesn't seem to be in the translator. Apart from minor differences in the metadata header, the two translator files are identical (according to Vim). I just tried importing with the RIS translator in v2.1.10 (Firefox), using the RIS translator from v3.0b3, and keywords on multiple lines still end up as one long space-separated tag. It is encouraging that importing the same file in the 3.0b3 standalone worked as expected, although it's still a little disconcerting that the stable version (2.1.10) has this bug :( Given that 3.0 is still in beta, is there a workaround for this in 2.1.10?

adamsmith · December 22, 2011

the translators get auto-updated, so they're always going to be the same as long as you keep more or less up to date versions of Zotero (i.e. the 2.0 branch doesn't get translator updates anymore).
I don't think there is a workaround - but couldn't you just use ZSA for the import and then export/import back into Z. 2.1.10? (probably in zotero.rdf, which is basically loss-less.

jawhiteley · December 22, 2011

On a slightly unrelated note, I was able to modify a copy of the BibTeX translator to use the contents of my 'extra' field as the cite key when present, and auto-generate one only when needed. It was surprisingly easy for me to do, so kudos to the developers for coming up with such a powerful system that is also relatively easy to customize!

Nevertheless, there's one little snag I haven't been able to solve, although it's not critical for me yet. Is there a way to extract the citation key on BibTeX import, so I can save that to the 'extra' field? Maybe I missed something obvious, but I just couldn't see if or how the translator parsed the citation key out of a BibTeX entry. Any suggestions?

jawhiteley · December 22, 2011

I don't think there is a workaround - but couldn't you just use ZSA for the import and then export/import back into Z. 2.1.10? (probably in zotero.rdf, which is basically loss-less.

I could, I just didn't know how different the data structure is between 3.0 & 2.1. Are there any significant incompatibilities I should be aware of?

adamsmith · December 22, 2011

no - database structure didn't change at all between those versions. Even if it did - the RDF export is stable between versions, you could even export/import from/to Zotero 1.0.
Note that it's normally not the recommended way of transferring a library (for that see here: http://www.zotero.org/support/kb/transferring_a_library ), but in this case it makes sense to use that.

amandafrench · December 28, 2011

Slight digression: like Rintze, I'm longing for batch editing, and I too think that Google Refine would be a good way to do it. Happy to hear about Pyozotero -- might try that out in preference to importing to Endnote and batch editing there, then exporting back to Zotero, which I currently do when I want to batch edit. I played around with Google Refine a little, and so far MODS XML data format seems to play best with it, but if anyone else has any other comments about Google Refine and Zotero, I'd be interested to hear them.

dstillman · December 28, 2011

Using export/import for an existing library is really not recommended. Batch editing in Zotero will be done via a native interface—it's not going to be done through Google Refine.

amandafrench · May 19, 2013

Update: I'm again messing around with using Google Refine to batch-edit and reconcile Zotero data; if anyone else is interested, the best export format seems to be CSL JSON if you want to get the data into Google Refine.