EndNote RIS _with_ attached PDFs, code suggestion
Hi
I have used Zotero only for a few days. Have used EndNote for years, as my university has a site license. I was delighted at first trying out Zotero, but then soon very disappointed as I discovered none of the PDFs had been imported from the EndNote RIS.
I soon found the wiki page with a solution, doing a search and replace on the RIS file first:
http://www.zotero.org/support/kb/exporting_from_endnote_with_pdfs
It worked. However I recommended Zotero to a colleague having trouble with EndNote. She gave up as soon as she saw the above mentioned wiki page contained some strange code...
So tonight I took a look at the code in the translator. Getting it to recognize the EndNote specific 'internal-pdf' was of course simple, but I still had only the relative path to the PDF EndNote supplied. The Zotero import function needs the full path.
On export, by default EndNote puts the exported RIS file in the same folder as the endnote database, with the same name but another extension (at least on my system, OSX, EndNote X5).
If I only could get the path of the RIS file I could figure out the path to the Endnote PDF folder (of course supposing the RIS file had not been moved or renamed relative to the EndNote PDF folder).
I found no way to retrieve the path from within the translator. Unfortunately this forced me to make changes also to the Zotero client core (besides the translator). As
Zotero.Translate.IO.Read.prototype in translate_firefox.js hosted the read function I used that to also expose nslFile.path.
On my system it now reads:
Zotero.Translate.IO.Read.prototype = {
"__exposedProps__":{
"_getXML":"r",
"RDF":"r",
"read":"r",
"setCharacterSet":"r",
"filePath":"r"
},
...code omitted for clarity...
"filePath":function() {
return this.file.path;
}
}
If there is a way to get the above path without this change, that I have not found, no core change would be needed.
In the translator I have changed how the L1 tag is handled in processTag:
...
else if(tag == "UR" || tag == "L1" || tag == "L2" || tag == "L4") {
// URL
if(!item.url) {
item.url = value;
}
if(tag == "UR") {
item.attachments.push({url:value});
} else if(tag == "L1") {
var importFile = Zotero.filePath();
var endnotePdfPath = importFile.slice(0,importFile.lastIndexOf(".") + 1) + 'Data/PDF/';
Zotero.debug('debug: endnotePdfPath= ' + endnotePdfPath);
var filename = value.slice(value.lastIndexOf("/") + 1);
if(value.substr(0, 15) == 'internal-pdf://'){
value = value.replace('internal-pdf://','file://' + endnotePdfPath);
Zotero.debug('debug: attachment url value= ' + value);
}
item.attachments.push({url:value, mimeType:"application/pdf",
title:filename, downloadable:true});
} else if(tag == "L2") {
item.attachments.push({url:value, mimeType:"text/html",
title:"Full Text (HTML)", downloadable:true});
} else if(tag == "L4") {
item.attachments.push({url:value,
title:"Image", downloadable:true});
}
....
I can now import _with_ pdfs.
The downside is of course that the RIS file must be named _exactly_ like the library but with another extension (not .enl) and that it must be place in the same folder as the .enl file - or if moved that the '[libraryname].data/PDF/' folder gets moved along with it. However in most cases this will not be a problem as this naming and file location seems to be default behavior for EndNote? In any other case it should behave just as it does now, ignoring the attachments. And anyway, explaining the importance of the naming and the placing of the RIS file should be less intimidating to most users than the script code currently on the above mentioned wiki page.
As you can see I also made a small unrelated change: The attachent title is no longer the generic 'Full text (PDF)' but rather the proper file name. If more than one attached file I found identical non descriptive titles very annoying.
And also, I have only tested it on OSX in firefox!
I have used Zotero only for a few days. Have used EndNote for years, as my university has a site license. I was delighted at first trying out Zotero, but then soon very disappointed as I discovered none of the PDFs had been imported from the EndNote RIS.
I soon found the wiki page with a solution, doing a search and replace on the RIS file first:
http://www.zotero.org/support/kb/exporting_from_endnote_with_pdfs
It worked. However I recommended Zotero to a colleague having trouble with EndNote. She gave up as soon as she saw the above mentioned wiki page contained some strange code...
So tonight I took a look at the code in the translator. Getting it to recognize the EndNote specific 'internal-pdf' was of course simple, but I still had only the relative path to the PDF EndNote supplied. The Zotero import function needs the full path.
On export, by default EndNote puts the exported RIS file in the same folder as the endnote database, with the same name but another extension (at least on my system, OSX, EndNote X5).
If I only could get the path of the RIS file I could figure out the path to the Endnote PDF folder (of course supposing the RIS file had not been moved or renamed relative to the EndNote PDF folder).
I found no way to retrieve the path from within the translator. Unfortunately this forced me to make changes also to the Zotero client core (besides the translator). As
Zotero.Translate.IO.Read.prototype in translate_firefox.js hosted the read function I used that to also expose nslFile.path.
On my system it now reads:
Zotero.Translate.IO.Read.prototype = {
"__exposedProps__":{
"_getXML":"r",
"RDF":"r",
"read":"r",
"setCharacterSet":"r",
"filePath":"r"
},
...code omitted for clarity...
"filePath":function() {
return this.file.path;
}
}
If there is a way to get the above path without this change, that I have not found, no core change would be needed.
In the translator I have changed how the L1 tag is handled in processTag:
...
else if(tag == "UR" || tag == "L1" || tag == "L2" || tag == "L4") {
// URL
if(!item.url) {
item.url = value;
}
if(tag == "UR") {
item.attachments.push({url:value});
} else if(tag == "L1") {
var importFile = Zotero.filePath();
var endnotePdfPath = importFile.slice(0,importFile.lastIndexOf(".") + 1) + 'Data/PDF/';
Zotero.debug('debug: endnotePdfPath= ' + endnotePdfPath);
var filename = value.slice(value.lastIndexOf("/") + 1);
if(value.substr(0, 15) == 'internal-pdf://'){
value = value.replace('internal-pdf://','file://' + endnotePdfPath);
Zotero.debug('debug: attachment url value= ' + value);
}
item.attachments.push({url:value, mimeType:"application/pdf",
title:filename, downloadable:true});
} else if(tag == "L2") {
item.attachments.push({url:value, mimeType:"text/html",
title:"Full Text (HTML)", downloadable:true});
} else if(tag == "L4") {
item.attachments.push({url:value,
title:"Image", downloadable:true});
}
....
I can now import _with_ pdfs.
The downside is of course that the RIS file must be named _exactly_ like the library but with another extension (not .enl) and that it must be place in the same folder as the .enl file - or if moved that the '[libraryname].data/PDF/' folder gets moved along with it. However in most cases this will not be a problem as this naming and file location seems to be default behavior for EndNote? In any other case it should behave just as it does now, ignoring the attachments. And anyway, explaining the importance of the naming and the placing of the RIS file should be less intimidating to most users than the script code currently on the above mentioned wiki page.
As you can see I also made a small unrelated change: The attachent title is no longer the generic 'Full text (PDF)' but rather the proper file name. If more than one attached file I found identical non descriptive titles very annoying.
And also, I have only tested it on OSX in firefox!
Honestly, though that also seems rather shaky. Probably an improvement over the status quo, but those seem pretty strict conditions. Sure they're easier to follow than the current ones, but if we're going to try to solve this, wouldn't some quick online tool running the sed script (in a way that the user only has to put in the filepath) be better? That way we could fix the keyword bug in Endnote's RIS output, too. (obviously that means someone needs to write this and put it up).
Also, presumably in your version we'd have to include another hack that distinguishes between Windows and Mac versions (\Data\PDF versus /Data/PDF), we'd have to be sure that the folder names are the same for different language versions of Endnote - what if the German folder is called Daten/PDF? All that strikes me as too fragile for a solution whose whole point it is to just work.
Regarding language problems I had that with the search and replace method suggested on the wiki, and it made me frustrated before I realized what was causing the instructions not to work:
My OSX is swedish. If I look in Finder my 'Users' folder is called 'Användare'. The 'Documents' folder is 'Dokument'. However to get Zotero to accept the path I had to use the english folder names; Users and Documents. It took me a while to figure out.
My EndNote version is in english so I do not know if they use different folder names in different language versions?
So yes, my suggested solution is far from perfect. But still I think it could have its place, if the Windows-MAC path delimiter problem can be solved, as I think it would work in many or even most cases out of the box. As opposed to the current situation where it does not work at all without editing the RIS file.
A bullet proof solution would of course be very much preferable!
I have both EndNote X4 and X5, and I downloaded the latest Refman (RIS) export output style from EndNote's website. I don't see any references to attached files in the exported RIS. Maybe I'm not attaching files correctly to the EndNote records, although they do show up under attachments if I look at record details.
I feel like I'm missing something simple. Any quick tips?
Sorry for a late answer, I was out of office so I could not test.
I cannot remember having done anything special to get the attachment file info into the export file.
I have my PDFs as attachements (if I click on an references in EndNote X5 the file is listed under the heading 'File Attachments'in the pop up window).
In the export dialog I select Text Only and RefMan (RIS) Export. I then get internal-pdf info in the L1 tag in the exported file.