Full-Text Indexing of Word Documents?

I'm running Microsoft Word 2004 for Mac (v11.5) with the latest Zotero for MS Word toolbar and have installed Zotero 1.0.10 in today's June 30th release of Firefox 3.5.

I have a number of literature reviews in word documents (.doc and .doc-converted-from-.docx) that I would like to import into My Library with search functionality. Is there anyway to index their full text?
  • No. But there is a ticket for it:
    https://www.zotero.org/trac/ticket/1102
    and other discussion on the topic in the forums.
  • What I often do, also for manuscripts in .doc format sent to me by colleagues, is make a PDF and attach it to my Zotero item. Not just to make it indexable by Zotero, but also because old formats may go awry or fonts may become obsolete. PDFs will stick around longer.
  • Sorry to resurrect this thread, but I was curious if there had been any developments (or future plans) on this topic. What is the current perspective on indexing text from non-PDF sources, particularly Microsoft Word files? It sounds like an interesting challenge!
  • nothing new, sorry. Mostly because I don't believe there is a tool comparable to pdftotext that would convert doc to text across platforms.
  • That makes sense, and thanks for the fast reply. Theoretically, if someone were to port catdoc (http://ftp.wagner.pp.ru/~vitus/software/catdoc/) or UnRTF (http://www.gnu.org/software/unrtf/unrtf.html) to Windows, then I suppose that would open up possibilities? I seem to recall one of the previous bottlenecks was that Firefox lacked IPC support? I'm not sure if that ever was implemented in Firefox.
  • I have found that opening the file in word and re-saving it as a .mht single file webpage will allow zotero to index the document.

    Saving as .mhl will also allow the zotero file to open in word on my computer when i click on the file later, and it maintains the collapsible outline in the word outline view, which is important to me.

    I know it is an extra step to open and resave word documents, but it makes Zotero a good solution for my needs.
  • There are now several tools to convert various file types to text, but they remain less easy to implement across platforms than pdftotext, so I'm not sure what the prospects of adding them to Zotero are.
Sign In or Register to comment.