Data loss on import of bibutils generated MODS file
Using bibutils to convert EndNote exported XML into MODS, then reading data into Zotero is an interesting alternative, as XML is probably the least broken EndNote export format.
I've tried it, and found out a lot of information got lost.
I have uploaded the sample file.
http://home.arcor.de/web_bill_be58/Zotero-Put2web/Library_MODS.xml
Some of the problems.
- The books by Berliner imported without year, publisher, number of pages
- The book (by Berliner) Title concatenated with Subtitle without space or other separator
Following citation imported as "books:
- The Thesis by Sittner
- "Scotch Tape Test" without any given document type (Web Page in EndNote XML, but bibutils issued a warning and discarded the doc type)
- A Conference Paper by Meng - without pages, place, conference name
----
There are probably other issues - I didn't check any further
I've tried it, and found out a lot of information got lost.
I have uploaded the sample file.
http://home.arcor.de/web_bill_be58/Zotero-Put2web/Library_MODS.xml
Some of the problems.
- The books by Berliner imported without year, publisher, number of pages
- The book (by Berliner) Title concatenated with Subtitle without space or other separator
Following citation imported as "books:
- The Thesis by Sittner
- "Scotch Tape Test" without any given document type (Web Page in EndNote XML, but bibutils issued a warning and discarded the doc type)
- A Conference Paper by Meng - without pages, place, conference name
----
There are probably other issues - I didn't check any further
to produce a BibTeX and importing that into Zotero produced much better results: Thesis imported as thesis, Journal Articles imported with Volume, Issue, and Pages... Only a Conference Paper (Meng) still got imported as Book, despite being tagged as "@Proceedings{Meng2005 " in the source.
1. Export from EndNote as XML
2. Send the file through two bibutils:
end2xml Your-EndNote.xml > New-MODS-File.xml
xml2ris New-MODS-File.xml > New-RIS-File.ris
(Call files with the paths if necessary; piping presumably possible).
Then, as proposed earlier:
http://forums.zotero.org/discussion/5311/importing-endnote-libaray-including-pdf-attachments/#Item_10
Copy the folder named "PDF" from the EndNote storage into you root (the start disk on my Mac)
Search and replace in the RIS file:
String to search for: "UR - internal-pdf://"
String to replace: "L1 - file:///PDF/"
Read the RIS file into Zotero. The import from the first impression more clean than direct import from EndNote RIS, and the PDFs are all here!
It is certainly possible to combine two conversions and one search-and-replace in a single shell script (and put it onto a web server) - but not today, and probably not from me (my unix skills are very limited).
Note: the link to the Mac/Intel binary on the bibutils home page http://www.scripps.edu/~cdputnam/software/bibutils/ is broken :(
<relatedItem type="series">
instead of<relatedItem type="host">
but I don't know what your EndNote XML data looked like or if bibutils can get this right. Zotero does the right thing when these are labeled as being part of a series.However, I think the MODS XML translator could be improved to use the originInfo of the record in preference to that of relatedItems (particularly when originInfo is absent from those relatedItems).
Note that NO references import with the total number of pages; this is a relatively recent field in Zotero. Also note that the total number of pages isn't enumerated by "Spin Labeling. Theory and Applications." Ticket created. I'd need to test this more, but the translator could use something like:
// title
Ticket created. This reflects a bug in Zotero (for both import and export. Zotero uses the genre "theses", but the proper MARC genre is "thesis". Garbage in leads to garbage out. There is no way for Zotero to know how to type this entry. Perhaps bibutils could be improved here, though. The Zotero MODS translator does not currently handle the "conference publication" genre. There are many genres that should be added in addition to this one.for each(var titleInfo in mods.m::titleInfo) {
// dropping other title types so they don't overwrite the main title
// we have same behaviour in the MARC translator
if(!titleInfo.@type.toString()) {
if (titleInfo.m::title.length()){
newItem.title = titleInfo.m::title.text().toString();
if (titleInfo.m::subTitle.length()) {
newItem.title = newItem.title + ": " + titleInfo.m::subTitle.text().toString();
}
} else {
newItem.title = titleInfo.*.text(); // including text from sub elements
}
}
}
http://home.arcor.de/web_bill_be58/Zotero-Put2web/EndNOte-and-RIS.zip
Note, Journal Article / Volume, Issue, Pages, Date are not retrieved on import form MODS, but read from RIS produced from this MODS file.
Is bibutils still in active development? I have sent an email to Chris Putnam, the author of the software, concerning a broken link (see above) and got no response yet.
If it is still being developed, noksagt, could you probably contact Chris on the above point?
Otherwise, what is your opinion - is any workaround from the Zotero side or by user possible? (I mean realistic possibilities only, not that any third person should take the bibutils source and implement your suggestions)
In general, bibutils is really good. I personally would like to see it moved to an open SCM repo (say GitHub) with an issue tracker, easier contribution, etc. I'd also like to see it combined into a single binary (right now there are what, 20?), and to see bindings developed for common scripting languages (he did already move the core to a library, so this is easier; there's one for haskell, for example).
All of which is to also underline the point that bibutils seems to me a good basis for any conversion web service.
Among other fixes and improvements, endx2xml (MODS) now recognize "web page" genre. Zotero, however, doesn't recognize this genre.
I have put the files, produced by new bibutils to
http://home.arcor.de/web_bill_be58/Zotero-Put2web/EndNote_via_bibutils_4.4_imp.zip
The bibutils produced MODS file "endx2MODS hand edited bbedit-reflow.xml" contains, among other, the item "i-PULSE" , which is a web page, and gets imported as a "book" by Zotero.