Use external software in fulltext search

Would it be possible to use an external application (e.g. Spotlight) to do full text search instead of building a full text index in Zotero?

Mikko
  • Zotero is platform-independent; most desktop search programs are not. You could, of course, have Spotlight (or others) index your storage directory & use their native search interface.
  • I have spotlight set up to index my zotero data directoty. What I would however like to accomplish is searching through full text index of MS Office files in Zotero, since not all files are in PDF format. Using an external tool for this would be usefull.

    Implementing this would not need to be platform specific. Windows, OS X, and Linux all have desktop search programs that can be used from the command line and then return a list of files matching the search in stdout. Basically I would like to see the option to instead of using built-in full text indexing of Zotero to define a command line call that would query what ever desktop search program the user wants and then return a list of matching files. Zotero could then parse the item keys from this output and add the corresponding items to the search results.

    Mikko
  • It's not a bad idea, but we're likely switching to an all-SQL search (using SQLite's full-text indexing support) and moving away from non-SQL-based searches. Zotero already uses non-SQL file scanning for phrase searching, and integrating that into a boolean search system with support for chained searches is extremely complicated and inelegant and is the main reason for current bugginess in some advanced searches.

    A better solution would be just to implement parsing of Word documents using a lightweight doc-to-text converter. noksagt at one point suggested some lightweight doc parsers. I haven't compared how well those work.
  • External desktop search has the advantage of covering a lot more file types that Zotero can. For example Google Desktop can do OCR for PDFs.

    One way to do this would be to create temporary database table of the search results from desktop search and then use this as part of the SQL query. I do not know what kind of performance this solution would have, though.

    In addition to doc parser, a powerpoint parser would be nice.

    Mikko
Sign In or Register to comment.