PDF attachment content and REGEX handy usages


I need to explore PDF attachment content in advance search. So far I've been using Regex capabilities very few and my expectation is that it is quite powerful.

a) I would appreciate any tip or suggestion regarding handy Regex operators/syntax/usages to mining data in a literature review process using Zotero.

b) To search exact expressions using Regex, are the "quotation marks" necessary?

Many thanks,
  • Handy tip (completely unrelated): To query two words within N words of each other, try
    E.g., to query for "secondary data" or "secondary analysis of data", use secondary(?:\s+\w+){0,2}\s+data

  • oh and for (b) the answer is no. That will just search for quotation marks as well.
  • also, for the basics
    the cheat sheet is good: http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

    b) in addition to what aurimas says - regexes are literal by default, i.e.
    /find this sentence/ (you don't need the slashes in Zotero) will only return true when you have exactly "find this sentence" in the document.
  • I remember seeing this at the time, but not paying much attention to it. But can somebody please clarify WHERE you do these regex searches? Can we use those in the Zotero Advanced search dialog box?
  • This page has highly useful information; thanks! Does anyone happen to know if Regex works differently when CJK (Chinese) characters are in the database? I tried some searches (including aurimas' very handy shortcut above) and they don't seem to find instances in my database. I'm wondering if my Chinese language content is the reason.
Sign In or Register to comment.