Develop Zotero into a Corpus tool?

I'm currently doing a PhD in applied linguistics and TESOL. My supervisor is an expert in corpus linguistics and recently received a grant to develop a tool for helping EFL students to write their academic papers using corpus.

Since most Zotero users have a collection of searchable PDFs, I wonder if it is possible to develop some kinds of add-on within Zotero to access this collection of PDFs as a corpus for language learning. For example, if I am not sure how to use a word "identity" in my own writing, a Zotero add-on may generate concordance lines for the keyword by extracting sentences that contain the keywords from the PDF collection for my reference. In present, a search for the keyword would only give me a list of items that contain the keyword, not the sentences or concordance lines.
  • These are really two different questions.
    The first one is about showing search context for full text searches. That has applicability far beyond corpus analysis, of course (that's why it's included in every google search) and is both generally planned for Zotero and possible given the general structure of Zotero's search/indexing. Main obstacles currently are implementing it, both in terms of generating the context lines and, probably more difficult, finding a good GUI solution for display of search results. If you're interested in working on this and know how to write javascript, post to the zotero development list with some ideas and Dan and/or others would likely be happy to give you some pointers.
    https://groups.google.com/forum/#!forum/zotero-dev

    The other issue is to use Zotero for actual corpus analysis, which is indeed something that should happen in an add-on. One such add-on already exists, so you may want to build on that:
    https://github.com/chrisjr/papermachines
  • Many thanks, Adam! I'll look into paper machines.
Sign In or Register to comment.