Import Zotero-Bibliography from a Word document into Zotero - solved!

apfelstrudel · September 21, 2016

Hello, I have a Word-Document with a Bibliography made by Zotero, but the books/references were stored in the Zotero of a different computer which I can´t access. Can I somehow import these references into my Zotero?

I tried endnote, but this will not import a Zotero-made Bibliography but only one made in Word, so this does not work. I tried copying the citations and pasting them into Zotero, but it says that this file format cannot be read.

I´m glad for all helpful comments!
Thank you!

bwiernik · September 22, 2016

Easiest way to do this is to install Juris-M, a Zotero variant with some additional features. Juries-M can import items from a Zotero Word bibliography. You can switch back to Zotero after importing the items if you like.

(@adamsmith This is a pretty common request. Currently the easiest method is to install Juris-M and use the feature there. Until document collections are added to Zotero, would it be possible to make a CSL style that displays CSL JSON, so that items could be easily copied into a library?)

adamsmith · September 22, 2016

CSL really isn't great for writing JSON, so I'm a bit worried about this. I think I might look at writing a VBA script that does this. parsing fields can't be so hard.

denlinkd · September 22, 2016

I've used anystyle.io with lots of success before. It's a highly manual process, but it's lots faster than manually adding items to your library.

bwiernik · September 22, 2016

I was thinking specifically for Zotero-created live bibliographies which have all of the embedded Zotero data. If not CSL JSON, then some other import format?

adamsmith · September 22, 2016

there is already bibtex as a citation style. Never occurred to me, but that might work OK at least.
My concern with any export format is that CSL is really poorly suited to write code (because it applies formatting rules designed for text) and I worry about offering a hack as a solution.

fbennett · September 24, 2016

Musing about this ...

It should be possible (for someone) to build something that (a) just extracts references from a document for import, without preserving links in the document, and (b) doesn't require installation of software.

Both *.docx and *.odt documents can be exploded to XML. By walking the appropriate component files, field codes with embedded CSL JSON strings could be extracted. The CSL JSON could then be converted to arbitrary formats. This could all be done in one go by a single Python, Ruby, or Node script with a few command-line options.

It would take some work, but once you had a working extractor, it could be offered as a service on the Web. Seems like a neat idea and a broadly useful gadget, if someone wanted to take it on as a side-project.

adamsmith · September 24, 2016

Yes, agree; apart from Zotero building in a solution (which also wouldn't seem to be that hard), an online extractor tool (or at least a command line tool to get the ball rolling) would be great.

Rintze · October 1, 2016

I've been playing around with https://github.com/mwilliamson/mammoth.js a little, and it looks relatively easy to create a website where users can upload a docx file, and have it extract the contents of all Zotero fields (data from bookmarks seems harder to retrieve, and it wouldn't work for other file formats).

Even if Zotero (or Mendeley) adds a feature to extract bibliographic info from existing Word documents, such a web tool might still be handy, so I might try putting it together.

What would be the best way to present CSL JSON to Zotero once extracted? Apart from a download button, how would I make the metadata accessible to Zotero via a translator? Dedicated translator?

adamsmith · October 1, 2016

cool! Personally I'd just go for a download button and not do a web translator: it's kind of weird behavior to get many, many items imported on a single click, but to get the download folder you'd have to parse the JSON to display search results, which seems tedious.
If you do want to go the translator route, I think dedicated web translator is likely the way, though you could likely make something work with unAPI.

fbennett · October 1, 2016

Good thought. I don't have a clear idea of the details, but if you could present the extracted data under a folder icon, that would be really nice.

fbennett · October 1, 2016

I wonder if you could get a folder icon with full item metadata to work by embedding the CSL JSON (or an RIS conversion of it?) in data URLs with an Ajax call.

Rintze · October 13, 2016

So, I quickly threw together a website to extract embedded Zotero references from MS Word .docx files and export them as CSL JSON: http://rintze.zelle.me/ref-extractor/ (code at https://github.com/rmzelle/ref-extractor)

The exported CSL JSON file can be imported into Zotero via the gear menu (using either "Import..." and selecting the file, or, if you have opened the file and copied its contents, via "Import from Clipboard").

It's still a bit rough, but feedback is welcome. It works for me with small test documents. I plan to add a "Copy to Clipboard" button, and I need to figure out the licensing (I'm using a customized version of the above-mentioned mammoth.js, which is "BSD-2-Clause"-licensed, but my own code should be MIT licensed). It should also be pretty easy to extract embedded Mendeley references.

bwiernik · October 13, 2016

Cool!

apfelstrudel · October 17, 2016

@Rintze your website worked perfectly well!!! Thank you so so so much!!!
I did not understand the part you were discussing inbetween, but with your website I could do it within 1 Minute!!!
My mother in law had that problem and she is so happy right now, she says it´s like Christmas right now! And she says I´m her hero, but actually you are! :)

Rintze · October 17, 2016

@apfelstrudel, glad you got it to work and found it useful! You might be the first real user. It's important to keep mothers-in-law happy! Let me know if anything was unclear.

Just as a small warning: I'm pretty sure the tool currently doesn't extract any uncited references yet (Zotero items that have been added directly to the bibliography via Zotero's "Edit Bibliography" button in Word), but many people don't ever do that.

mark · October 21, 2016

Neat! Worked for me with a test document, will recommend it to folks who have asked me about a feature like this.

Rintze · October 22, 2016

@"Dan Stillman", I noticed that Zotero 4.0.29.10 doesn't actually seem to embed item metadata for "uncited" items that are directly added to the bibliography in Word documents via the Edit Bibliography button. The bibliography field just has the item IDs:

<w:instrText xml:space="preserve">ADDIN ZOTERO_BIBL {"uncited":[["http://zotero.org/users/1031436/items/HU4NC489"],["http://zotero.org/users/1031436/items/WZFNPG9D"]],"custom":[]} CSL_BIBLIOGRAPHY</w:instrText>

Assuming this never worked and isn't a bug, can I put in a feature request for it? It would be more logical if the scope of the "Store references in Document" option extended to such uncited items.

adamsmith · October 22, 2016

(@Rintze -- if you actually want to @ Dan use Dan%20Stillman )

Rintze · October 23, 2016

(@adamsmith, it's what I get when I tab-to-autocomplete after typing "@Dan Stil" [without quotes], though)

Rintze · October 23, 2016

I updated Reference Extractor to also extract Mendeley items.

What would be the best way to present CSL JSON to Zotero once extracted? Apart from a download button, how would I make the metadata accessible to Zotero via a translator? Dedicated translator?

Since it's pretty simple to import CSL JSON into Zotero (either via "Import..." or "Import from Clipboard"), I'll probably won't bother converting the CSL JSON to any other format. Pull requests are of course welcome, though. (Also, Mendeley doesn't seem able to import CSL JSON)