(Almost) convert citations from Endnote to Zotero in Word .docx
I have some documents written in Microsoft Word (.docx format), which use Endnote citations. I have imported my Endnote library into Zotero, and would now like to continue editing these documents. To do this, I need to convert my Endnote citations in each document into Zotero citations.
The consensus seems to be that the best way to do this is to find each Endnote citation, manually insert an equivalent Zotero citation, and delete the Endnote one. I would prefer to have an automated way to do this, so I developed the following series of steps. Unfortunately, possibly due to a Zotero bug, the steps below don't really work, so I would recommend continuing with the manual way for now. But I wanted to mention the steps here in case anyone can push this further.
Note that this process requires that you have Microsoft Word and LibreOffice installed, and that Endnote is installed and working with MS Word, and that Zotero is installed and working with both Word and LibreOffice. The document will make a round-trip between Word (.docx or .doc format), LibreOffice (.odt format), then back to .doc and finally, back to .docx if desired. This will interfere with a lot of formatting. I'm also not sure whether this process will preserve page numbers or notes in your citations. But if you have a lot of text with simple citations, it may be worthwhile. (This would be easier if Zotero offered a "docx scan" feature or if the Word plugin could convert temporary/scannable citations into real Zotero citations.)
Here are the steps I tried:
1. In Endnote, create a custom output style that looks like the scannable citation markers used by Zotero's ODT scanner ( http://zoteromusings.wordpress.com/2013/05/06/announcing-rtfodf-scan-for-zotero/ ). Save it as something like "Zotero ODT Scan". When creating this output style, the important elements are under Citations -> Templates. The Citation should look like this:
{|`|` Author, Title, Year `|` `|` `|`|}
If you are also offered a "Citation - Author (Year)" option (I think this appeared sometime between Endnote X2 and Endnote X6), it could look like this:
{|`|` -Author, Title, Year `|` `|` `|`|}
Also set the Multiple Citation Separator to this (two curly braces):
}{
2. In Microsoft Word, reformat your citations using the Endnote plugin with the "Zotero ODT Scan" style defined above. This should create citations that are still field codes (gray when you click on them) but look a little like temporary Endnote citations (surrounded with curly brackets), with some extra vertical bars as well. (For the later steps, it may be worthwhile to convert the field codes to plain text at this point, but I haven't found that this makes a difference.) Save your document as a .doc or .docx file, but with a new name.
3. Open the document in LibreOffice and save it as an "ODF Text Document (.odt)".
4. In Zotero, choose RTF/ODF scan from the gear popup button. Choose "ODF (to citations)" as the File type. Choose your .odt file as the Input File and choose an appropriate location for the Output File.
5. Open the output file from the previous step in LibreOffice. Note that the citations should now appear as Reference Marks (gray text). These use an unusual citation format (author, title and date, with no enclosing parentheses), but otherwise are fairly normal Zotero citations.
6. Click the "Set Document Preferences" button on the Zotero toolbar in LibreOffice. Make sure Format Using: ReferenceMarks is selected, then click OK. At this point, every citation in your document will be converted to {Citation}, and you will be prompted to select a substitute item for the first one. Once you choose a replacement for the first one, Zotero then replaces every citation in your document with this reference. Alternatively, if you start this process by clicking on the "Refresh" button on the Zotero toolbar, the citations don't get converted to {Citation}, but the rest is the same (you are prompted for a substitute, and that is applied to every citation in the document).
There appear to be several bugs here:
(a) Zotero doesn't recognize the citations as citations at all if you choose the "Bookmarks" option in the "Refresh" or "Document Preferences" dialog box. If you choose ReferenceMarks then Bookmarks in various orders, you can eventually get the document scanning process to begin. But sometimes this causes LibreOffice to crash instead (and then, at least on a Mac, you also have to force Zotero to quit in order to get the plugin working again).
(b) Once Zotero notices there are citations in the document, it still doesn't find them in the library, even though they are there. I am assuming that the ODT scanner is able to lookup references in the library based on the author, title and year, like the RTF scanner, but I may be wrong about this. Maybe the ODT scanner is just leaving placeholder citations in the document and then failing to match them to the library later? I would expect Zotero to link most of the scannable citations to library entries either during the ODT scan or when it first looks through the post-scanned document interactively via the LibreOffice plugin. I would also expect a dialog box to popup at one of these stages, similar to the RTF scan dialog box for unmatched references. (This shows the information in the scannable citation and then lets you manually search for the reference in your library.)
(c) When Zotero replaces the first unrecognized citation, it also replaces all the other citations with the same reference.
7. Once you have valid citations, click on "Set Document Preferences" and choose Format Using: Bookmarks.
8. Save the document as a Microsoft Word 97/2000/XP/2003 (.doc) file. It would be preferable to save the file as a "Microsoft Word 2007/10 XML (.docx)" file, but two bugs (probably in LibreOffice) make that infeasible:
(a) The custom document properties that Zotero uses to cross-reference each citation back to your Zotero library don't get saved when LibreOffice creates a .docx. (These do get saved when you create a .doc file from LibreOffice, which can be verified by choosing File -> Properties -> Custom in MS Word. Strangely, these properties don't show up in the equivalent dialog box in LibreOffice itself, so I'm not sure where LibreOffice keeps them, which makes it hard to file a bug report.)
(b) If a bookmarked citation appears at the end of a paragraph, LibreOffice instead creates an empty bookmark at the start of the paragraph when converting to .docx (I think this is LibreOffice bug 65955). This doesn't occur when LibreOffice writes a .doc file.
9. Open the .doc file in Microsoft Word. Click the "Zotero Set Doc Prefs" button on the Zotero toolbar. Then choose Format Using: Fields.
10. Save the file in .doc or .docx format. Correct any formatting errors introduced by all the file conversions. This may require copying tables and pictures back from the original Word document. I've also noticed that my "Body Text" style in the original Word document somehow becomes "Text Body" in the final one.
I hope this will point out a few fixable bugs and maybe lead to a useful/usable conversion process.
The consensus seems to be that the best way to do this is to find each Endnote citation, manually insert an equivalent Zotero citation, and delete the Endnote one. I would prefer to have an automated way to do this, so I developed the following series of steps. Unfortunately, possibly due to a Zotero bug, the steps below don't really work, so I would recommend continuing with the manual way for now. But I wanted to mention the steps here in case anyone can push this further.
Note that this process requires that you have Microsoft Word and LibreOffice installed, and that Endnote is installed and working with MS Word, and that Zotero is installed and working with both Word and LibreOffice. The document will make a round-trip between Word (.docx or .doc format), LibreOffice (.odt format), then back to .doc and finally, back to .docx if desired. This will interfere with a lot of formatting. I'm also not sure whether this process will preserve page numbers or notes in your citations. But if you have a lot of text with simple citations, it may be worthwhile. (This would be easier if Zotero offered a "docx scan" feature or if the Word plugin could convert temporary/scannable citations into real Zotero citations.)
Here are the steps I tried:
1. In Endnote, create a custom output style that looks like the scannable citation markers used by Zotero's ODT scanner ( http://zoteromusings.wordpress.com/2013/05/06/announcing-rtfodf-scan-for-zotero/ ). Save it as something like "Zotero ODT Scan". When creating this output style, the important elements are under Citations -> Templates. The Citation should look like this:
{|`|` Author, Title, Year `|` `|` `|`|}
If you are also offered a "Citation - Author (Year)" option (I think this appeared sometime between Endnote X2 and Endnote X6), it could look like this:
{|`|` -Author, Title, Year `|` `|` `|`|}
Also set the Multiple Citation Separator to this (two curly braces):
}{
2. In Microsoft Word, reformat your citations using the Endnote plugin with the "Zotero ODT Scan" style defined above. This should create citations that are still field codes (gray when you click on them) but look a little like temporary Endnote citations (surrounded with curly brackets), with some extra vertical bars as well. (For the later steps, it may be worthwhile to convert the field codes to plain text at this point, but I haven't found that this makes a difference.) Save your document as a .doc or .docx file, but with a new name.
3. Open the document in LibreOffice and save it as an "ODF Text Document (.odt)".
4. In Zotero, choose RTF/ODF scan from the gear popup button. Choose "ODF (to citations)" as the File type. Choose your .odt file as the Input File and choose an appropriate location for the Output File.
5. Open the output file from the previous step in LibreOffice. Note that the citations should now appear as Reference Marks (gray text). These use an unusual citation format (author, title and date, with no enclosing parentheses), but otherwise are fairly normal Zotero citations.
6. Click the "Set Document Preferences" button on the Zotero toolbar in LibreOffice. Make sure Format Using: ReferenceMarks is selected, then click OK. At this point, every citation in your document will be converted to {Citation}, and you will be prompted to select a substitute item for the first one. Once you choose a replacement for the first one, Zotero then replaces every citation in your document with this reference. Alternatively, if you start this process by clicking on the "Refresh" button on the Zotero toolbar, the citations don't get converted to {Citation}, but the rest is the same (you are prompted for a substitute, and that is applied to every citation in the document).
There appear to be several bugs here:
(a) Zotero doesn't recognize the citations as citations at all if you choose the "Bookmarks" option in the "Refresh" or "Document Preferences" dialog box. If you choose ReferenceMarks then Bookmarks in various orders, you can eventually get the document scanning process to begin. But sometimes this causes LibreOffice to crash instead (and then, at least on a Mac, you also have to force Zotero to quit in order to get the plugin working again).
(b) Once Zotero notices there are citations in the document, it still doesn't find them in the library, even though they are there. I am assuming that the ODT scanner is able to lookup references in the library based on the author, title and year, like the RTF scanner, but I may be wrong about this. Maybe the ODT scanner is just leaving placeholder citations in the document and then failing to match them to the library later? I would expect Zotero to link most of the scannable citations to library entries either during the ODT scan or when it first looks through the post-scanned document interactively via the LibreOffice plugin. I would also expect a dialog box to popup at one of these stages, similar to the RTF scan dialog box for unmatched references. (This shows the information in the scannable citation and then lets you manually search for the reference in your library.)
(c) When Zotero replaces the first unrecognized citation, it also replaces all the other citations with the same reference.
7. Once you have valid citations, click on "Set Document Preferences" and choose Format Using: Bookmarks.
8. Save the document as a Microsoft Word 97/2000/XP/2003 (.doc) file. It would be preferable to save the file as a "Microsoft Word 2007/10 XML (.docx)" file, but two bugs (probably in LibreOffice) make that infeasible:
(a) The custom document properties that Zotero uses to cross-reference each citation back to your Zotero library don't get saved when LibreOffice creates a .docx. (These do get saved when you create a .doc file from LibreOffice, which can be verified by choosing File -> Properties -> Custom in MS Word. Strangely, these properties don't show up in the equivalent dialog box in LibreOffice itself, so I'm not sure where LibreOffice keeps them, which makes it hard to file a bug report.)
(b) If a bookmarked citation appears at the end of a paragraph, LibreOffice instead creates an empty bookmark at the start of the paragraph when converting to .docx (I think this is LibreOffice bug 65955). This doesn't occur when LibreOffice writes a .doc file.
9. Open the .doc file in Microsoft Word. Click the "Zotero Set Doc Prefs" button on the Zotero toolbar. Then choose Format Using: Fields.
10. Save the file in .doc or .docx format. Correct any formatting errors introduced by all the file conversions. This may require copying tables and pictures back from the original Word document. I've also noticed that my "Body Text" style in the original Word document somehow becomes "Text Body" in the final one.
I hope this will point out a few fixable bugs and maybe lead to a useful/usable conversion process.
a) Since ODF Scan inserts reference marks you need to initially tell Zotero to use reference marks. As you note you should be able to switch that to bookmarks once you have selected a citation style (and, in your workflow, substituted the citations), though due to LibreOffice limitations that style cannot be a footnote/endnote based style
b) you assume incorrectly. Our goal in making ODF scan was to create a 100% reliable process for inserting and converting citation markers, so the only information the scan ever looks at is the identifier at the end of the curly brackets. The scan will not attempt to recognize citations - it doesn't actually do anything to match citations to a Zotero database - it just creates Zotero-readable citations. Zotero, since it assumes citations were inserted with the LO plugin, only looks for the internal identifier - that ODF scan would include in the citations. I'm sure it would be possible to do some extent of citation matching, but I don't have the time and interest to develop it and I don't believe Frank has either. If someone wants to, though, I (and again I assume Frank) would be happy to chat and accept patches.
c) Note sure I understand you here, wouldn't that be desirable?
oh nevermind - I understand now. That's due to b) - since you don't enter an identifier, they all have the same one.
However, this does give me an idea for another step in the workflow - I could use the RTF scanner to create ODF-scannable citations, including the Zotero identifier. Then I could convert the rtf document into odf and use Zotero's ODF scanner to convert those into real reference marks.
On point a): when I open the post-scan ODF document, it already contains Zotero-inserted reference marks. So I would expect that choosing "Format Using: Bookmarks" from the Refresh or Document preferences dialog would immediately initiate a refresh from the database and convert these to bookmarks format. Instead, nothing happens. I must first use one of those dialog boxes with the "Format Using: ReferenceMarks" option to get Zotero to recognize the reference marks. (This is not an obvious step if your goal is just to convert the existing reference marks into bookmarks.)
On point c): My original assumption was that Zotero was trying to match a library record based on the author/title/year information. So I was surprised when I gave a substitution for the first reference (Walters, Title A, 2010) and then Zotero used that same substitution for all other references in the document (e.g., Simpson, Title B, 2011 and Beauregard, Title C, 2003). However, in light of point b), I'm guessing that Zotero is doing something more like, "I can't find a reference for identifier <blank>, please locate it." Then when I choose a reference, it applies that to all other citations with the same identifier (i.e., <blank>), which means all the citations in the document, even if the text shows a different author/title/year.
I think the way Zotero currently handles scanned citations with blank identifiers should be improved, since it inevitably leads to converting all these citations to refer to the same reference. I'm sure this is rare, but it can be confusing for people like me poking around and trying to develop new workflows. To avoid this confusion, I would suggest four options for text citations that don't include Zotero identifiers:
(a) leave them as plain text during the scan and possibly warn the user that invalid citations were observed (maybe even note that an identifier is a mandatory part of the scannable citation);
(b) convert them into reference marks, but display some sort of text like "no Zotero identifier provided";
(c) convert them into reference marks, but give them dummy identifiers that vary for each citation (ideally the same identifier would be used for each instance of the same author/title/year text). Then when the user is prompted to choose a substitute, it will only be applied to the relevant reference marks.
(d) convert them into reference marks with blank identifiers (as now). Later, when the user is prompted to choose a substitute for a reference with a blank identifier, apply that substitute only to the currently-highlighted reference (not to all the other ones with blank identifiers).
Of course my preferred option would be to look these up by author/title/year, either during the ODF scan or during the LibreOffice refresh. But that seems unlikely at this point.
The rest I don't think will work for us — you'll have to realize, though, that the only way someone would get blank identifiers is trying to hack things. This would never occur with regular use, neither with ODF scan, where we have the Scannable Cite translator, nor with the word processor plugins — and while, of course, hacking things is fine, that's why we do open source — it does come with less of a guarantee of seeing user-friendly/intuitive results
On the other hand, I think my larger goal of converting Endnote citations into Zotero is probably fairly common. It's already easy to migrate the library, so being able to convert citations in existing documents would make the migration completely painless. I doubt I'm the only one who recycles parts of old documents (especially references) and/or is always in the middle of a project with some citations already entered via my old software. But I'll leave that for another day!
1. Improve RTF Scan so that it can convert scanned citations to live Zotero citations or
2. Importing Endnote reference numbers (the #678 you can have it insert) on import and allow matching them. I believe that's what Bookends does.
Thanks!
As always, of course, patches are very welcome and to the extent that I can, I'd be happy to advise people interested on working on this over on zotero-dev, as I'm sure would other, more competent people than me. Generally speaking, improving RTF-Scan may be more straightforward (and will also have the advantage of being more broadly useful, e.g. for people who want to use other word processors).