Importing EndNote files into Zotero
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
if people didn't explain exactly which platform they were trying
to export to -- I can imagine that Thomson/Reuters is not exactly
falling over themselves to facilitate data transfer to Zotero at
the moment ...
Follow up questions...
1. Is their a functional limit to how many references a 'collection' can have?
2. Does increasing the number of references increase the load on memory?
3. Is their an ideal format for importation into Zotero. I notice in the export dialog their is a Zotero RDF option. What format is this in? Why isn't their a comparable option in the import dialog?
4. Endnote allows you to create an output style. Is their documentation anywhere of the 'ideal' import format (that is a 1:1 list of data/type:fields/type)so such a style can be created? Once this style was created then issue of importing Endnote bibliographies would disappear.
Re. 3:
Zotero does allow you to import RDF. The Zotero RDF uses a vocabulary that is used by few (if any) applications outside of Zotero. It is probably able to export/import more info from Zotero.
I am a big fan of MODS XML right now, as it is a rich standard that others already do use.
There are efforts to make an RDF ontology that consider the needs of programs and users other than Zotero.
The best format that EndNote exports and Zotero imports is probably RIS currently.
Re. 4:
EndNote's data and export templates are fundamentally broken enough not to allow something that is completely satisfactory. A MODS XML export from EndNote would be very useful, but it seems impossible to implement a filter that would make perfect MODS XML.
EndNote does have their own XML schema (they actually have a few conflicting versions (different in import & export and different across different versions) and their XML sometimes does not validate). An EndNote XML importer for Zotero is an open item in trac.
I created a book and book section item in an empty library then exported and reimported the two dummy references. The data in the references were just descriptions of the field name. I was quite surprised.
Quickly...
MODS Export -- lost data in fields # of Volumes, Language, Short Title, Repository and Extra, and lost all the attachments (URLs to files or websites).
REFER/BIBLX -- Totally corrupted the records with an extra webpage reference being created from the book reference.
RIS Import -- lost data in fields series number, # of Volumes, edition, language, call number, loc. in archive, repository, rights and extra, and lost all attachments (URLs to files and websites)
Zotero RDF -- author fullnames were transfered to lastnames, lost URLs to local files, URL to websites retained.
On inspection of the RDF file created, it is obvious that it is not going to be easy for anyone outside Zotero to create a translator of any type.
I am wondering if a modified RIS format could be accepted as an import option (called Zotero RIS), which has the missing tags present to make a complete Zotero record. This information could then be used to make a modified Endnote Output Style. For example, RE - for repository. RI - for rights, etc.
Something else worth noting different between Endnote and Zotero is how the data is stored. In Endnote if you change between reference types, say from a book to article, fields are just renamed and the data stays intact. Change it back and all the original data returns. In Zotero, if you change to a reference type with less fields you get a warning about the potential loss of data in the fields now not required.
For the record: I am using XP Pro SP3, FireFox 3.0.3 and Zotero 1.0.7.
Many of the issues with MODS XML can be improved--the format is rich & has support for many of those things.
It is unclear what issues you had with REFER. However, this & RIS & BibTeX are rather limited formats.
RIS does not support several of the fields that you had problems with. Expanding RIS seems like a bad idea to me. It isn't an extensible format. If we really need a work around for one of the flat file formats, we can put formatted information into one of the user-definable fields (e.g. 'U5 - <rights>cc-by-sa-3.0</rights">' or a similar way of using an existing field).
EndNote's "hidden" data actually has had many long-term complaints. I wouldn't want it in the Zotero interface & it doesn't matter for data migration, I think.
In regards to the actual number of references that can be imported in a shot. I have successfully imported 1000 without generating an error. So I will be cutting my EndNote library up into smaller bits for importation. Thanks everyone for your input.
I'm afraid that there will be other data input errors, and as I am beginning my thesis, I really can't afford dumb mistakes that make it look like I don't know what I'm doing. Exactly how reliable is this import at the moment? With thousands of citations, I'm really not looking forward to importing by hand. (i.e. that will have to occur after the dissertation if at all)
The next step beyond this would be to open up the exported file in a text editor and make sure it looks OK. If you are still having trouble with the file there is a good chance you can massage it a little bit with a few thoughtful uses of find and replace.
Lots of Zotero users have transferred over from Endnote, many with thousands of references in their collections. Generally if you are willing to fuss with it a little bit you can get a very clean transfer.
General observations...
1. The generic 'Refman (RIS) Export' file provided with EndNote 5 does not result in a clean import into Zotero (I suspect different version of the RIS Format Specifications have existed over time but these versions have not been labelled as such). I searched the Endnote Style repository on the internet for the latest version of the file and found that this style is now not available. I proceeded to modify the file I had based on the latest RIS Format Specifications.
2. After doing this and conducting some trial imports, I found that not all the tags specified in the RIS Format Specifications are recognised by Zotero. I then started generalising until I obtained the maximum data transfer into Zotero Fields that made sense to me. Issues included retaining my repository data (e.g. library, reference collection, project file) and accession codes (e.g. W2345, R459, Y236 respectively) and importation of my keyword data and abstracts. I found that the keyword -> tags was unsatisfactory and messy and pushed this data, along with the abstracts into Zotero's Notes Section. I used the Zotero Abstract Field to store the RepositoryLocation:AccessionCode combination for the lack of an alternative field that imported RIS data could be sent. As Zotero functionality improves I am hoping that a field copy function will be developed that will allow me to transfer this data into the repository field available in Zotero (but which RIS Import data can not be directed). Personally I would like to see a Zotero 'text-based' RIS-like import file format where their is a TAG available for each field and reference type. People can then send their data to this text-based file using anyone of a multitude of programs e.g. endnote, databases (e.g. foxpro, access), notepad, word, etc.
3. I found that my Endnote library was best cut up into blocks of about 1000 references. Even with a reasonably fast computer this import took about 20 minutes.
4. With such a large dataset I find the use of the 'search' field very irritating. This field acts more like a dynamic filter; type 'a' and it filters the data to only show those with 'a', type 'ab' it only shows references with 'ab'. Type a name and the system shuts down for 5-10 minutes until the filter has caught up. The 'advanced search', represented by the magnifying glass is much better. This facility appears to search various indexes and produces good results in seconds and allows you to store the results as a type of collection.
5. Compared to Endnote, Zotero is a breeze at getting reference data available on the internet into its database. Coupled with a NewsReader to let me know when the latest issue of the key journals that I read are released - it has become a quick and easy routine of checking my feeds followed by reviewing the reference lists that appear and clicking on the icon in the navigation field for any reference that I wish to save. It is as easy as that. Of the various places I visit I have only encountered problems importing data this way 5% of the time. In most cases the problem centres around importing collections - if I drill down to the individual reference the import works.
6. Endnote Styles appear to provide the user with greater control of the output (combination of the huge variety of scientific styles available plus the availability of an inbuilt style editor). CSL styles are OK but biased towards the humanities and social sciences Very few scientific styles are available. Of four journals I intend to publish articles in 2008 and 2009, I am having to create CSL style sheets in XML for every journal. This adds an unexpected overhead to paper preparation. CSL/XML also represents a huge learning curve for the uninitiated.
7. For people trying to get their Endnote Data into Zotero I have provided the format of my Endnote Style Sheet below for the key reference types recognised by Zotero Bibliographic Style Sheets.
Generic
`TY - `GEN|`
AU - `Author|`
PY - `Year|`
BT - `Secondary Title|`
ED - `Secondary Author|`
CT - `Title|`
CY - `Place Published|`
PB - `Publisher|`
T3 - `Tertiary Title|`
A3 - `Series Editor|`
ET - `Edition|`
SP - `Pages|`
Y2 - `Date|`
SN - `ISBN/ISSN|`
N1 - `Notes|`
N1 - `Abstract|`
N1 - `Keywords|`
N2..- `Accession Number|`
VL - `Volume|`
UR - `URL
Journal Article
`TY - `JOUR|`
AU - `Author|`
PY - `Year|`
TI - `Title|`
SP - `Pages|`
JF - `Journal|`
VL - `Volume|`
IS - `Issue|`
N1 - `Notes|`
N1 - `Abstract|`
N1 - `Keywords|`
N2 - `Location|: Accession Number|`
UR - `URL
Book
`TY - `BOOK|`
AU - `Author|`
PY - `Year|`
BT - `Title|`
CY - `City|`
PB - `Publisher|`
SP - `Number of Pages|`
T3 - `Series Title|`
ED - `Series Editor|`
ET - `Edition|`
VL - `Volume|`
Y2 - `Original Publication|`
SN - `ISBN|`
N1 - `Notes|`
N1 - `Notes|`
N1 - `Keywords|`
N2 - `Location|: Accession Number|`
VL - `Volume|`
UR - `URL
Book Section
`TY - `CHAP|`
AU - `Author|`
PY - `Year|`
BT - `Book Title|`
ED - `Editor|`
CT - `Title|`
CY - `City|`
PB - `Publisher|`
ET - `Edition|`
VL - `Volume|`
T3 - `Series Title|`
SP - `Pages|`
N1 - `Notes|`
N1 - `Notes|`
N1 - `Keywords|`
N2 - `Location|: Accession Number|`
VL - `Volume|`
SN - `ISBN|`
UR - `URL
I'm biased, but I'm not the only one who believes CSL is much better designed than Endnote's style system purely from the styling standpoint. True. It's something I think many of us hope and expect will be addressed in time. Ideally, we get to a point where a user can just do a few clicks to create a new style.
You stated... At present, I am currently unclear of my opinion of CSL. As you may have noticed I stated "Endnote Styles appear to provide the user with greater control" - empahasis on "appear". I am still fiddling with XML creating a range of new styles and reserve judgement until I have fully explored CSL functionality.
One thing I have noticed is that most styles in Zotero only have 3 output styles - BOOK, CHAPTER and DEFAULT (formatted to present as a JOURNAL entry). This is very limited in my view. I have been told in other parts of the forum that this system captures most situations but I am yet to be convinced. I have a wide variety of webpages, reports, conferences, unpublished manuscripts, maps, CD Software, Computer Programs and Legislation that have quite specific data that needs to be presented in a set way in a bibliography. Having all these items presented like a journal is limiting. The problem though does not appear to be in the ability of CSL/XML to render the references only that people don't wish to program for all these reference types.
Simple Styles, as found in the Zotero Style Repository, already appear complicated, especially without any explanatory comments in the code. How is a style with all types of references accounted for going to look and be maintained? Questions I am sure that would spark a debate in other sections of the forum.
Suffice to say I have not finalised my judgement on this part of Zotero until I have fully explored its functionality. I will post my views in the appropriate spot once I have finished and submit my styles to the repository when they are complete.
In any case, you are free to either fix particular styles or cite documentation for a style that indicates that the CSL-file may need more type-specific handling & these issues can be addressed. No worse than the ugly bloatedness of a similar EndNote style...
See some design notes of mine from awhile back. It's not that simple. In CSL, all types have one of three fallback types: article, book, and chapter. These base types correspond to the structural characteristics I describe in that link above. So when, say, Zotero sees a record for which it does not see any particular CSL logic, it maps it to the corresponding base type. This is why for many case, you don't need definitions for much more than those three types.
Aside: with macros, this feature doesn't really need to be here, but has remained for what I'd call legacy reasons. The more code, the longer it takes, and the more buggy it potentially is. So it makes sense to code for the common cases, and address problems as they arise. As I hinted above, in order to understand the design of CSL, you need to stop thinking about citation formatting through the lens of reference types. If you have styles that are a collection of smartly designed macros, with simple citation and bibliographic definitions, it becomes quite easy to maintain those styles. Indeed, that's a big part of the idea behind the macro system.
I have however spend several days working through the information available on this website and spend time downloading, reviewing and adjusting existing styles to better understand what you are trying to achieve. Independently, I have prefaced my investigation with the the following assumptions:-
- All references have 4 distinct data types: CREATOR, DATE CREATED, TITLE, WHERE IT CAN BE OBTAINED. In reality, this is all we (as users) really want to know. Each reference type have variants on this data.
- Every journal has a list of nitty picky specifications of how they want the in-text citations and bibliographies formatted.
Following on from your comments I take it you are suggesting that you use macros to format the CREATOR data, DATE CREATED data, etc. and only use the citation and bibliography sections to control how single objects (BOOK), parts of a single object (CHAPTER) and sections of a part of a object (ARTICLE) are presented.One would presume that if your CSL file is structured properly you should only ever need to modify the options and formatting in the citation and bibliographic sections.
Am I on the right track?
But, I'd go a little further. Take a look at the bibliography section for the APA style. It is only a series of macro and variable calls (though I'm a little confused about why there's both a "container-contributors" and a "secondary-contributors" macro; they ought to be the same thing).
Or, this more complex Chicago style also shows the basic idea. There you have fragments like:
<text macro="contributors"/>
<text macro="title"/>
<text macro="description"/>
<text macro="secondary-contributors"/>
So the code here is all generic, and if there's any type or data-specific logic, it happens in those macros.
Also, on your last question, it depends. In some cases, you'd be more likely to be making tweaks to the macros.
TY - JOUR
AU - Martin, Ron
AU - Sunley, Peter
PY - 1998
TI - Slow convergence? The new endogenous growth theory and regional development
SP - 201-227
N1 - Jul
JF - Economic Geography
VL - 74
IS - 3
SN - 00130095
N1 - Slow convergence? The new endogenous growth theory and regional development
N1 - TY - JOUR
KW - Economics
Technology
Geography
Human capital
Economic growth
N2 - In economics, interest has revived in economic growth, especially in long-term convergence in per capita incomes and output between countries. This mainly empirical debate has promoted the development of endogenous growth theory, which seeks to move beyond conventional neoclassical theory by treating as endogenous those factors particularly technological change and human capital relegated as exogenous by neoclassical growth models.
UR - http://proquest.umi.com/pqdweb?did=34405463&Fmt=7&clientId=16241&RQT=309&VName=PQD
ID - 13875
ER -
In order to isolate this particular record as the one causing the problem, I had to "jackknife" the file, trying first to import all 70 records, then the first 35, then the first 15 or so, etc. until I isolated the problematic record(s). With 70 original records, at least twelve caused this problem with the 1.5 preview.
As if this wasn't enough trouble, I really need to use another set of references from a search yielding 181 items. Again, I used a jackknife technique and had a substantial number of records (maybe 30 or so) that imported OK. Then I was interrupted and when I returned I imported these records a second time. I've been unable to delete them. A "Find Duplicates" tool would be very helpful, but if I can't delete the duplicates it won't do me any good.
Finally, the jackknife approach works for all but the error-causing records. Nonetheless, it's incredibly time consuming. With 181 records and a 15% error rate, there will be close to 30 errors, or an average of one every 15 or so records. Since each problematic record has to be isolated individually, this will take hours.
Any help would be most appreciated. If you want copies of some of the other problematic records, I can supply some. In the first batch of 70 records, when an individual record would not import, I save its text file.
Thanks.
Sean, I'm not sure what you mean. All I get is the text I quoted in my earlier post. Is there some place to look for more detailed information?
For b, you will have to locate your Zotero library and copy it to the Firefox profile on your other computer. More info here: http://www.zotero.org/support/zotero_data