How to Regex PDFs content excluding specific words from results?

Hi there,

a) I need to search PDFs content to find those which has anywhere "Environmental movement" and "Greenpeace" words. If both words aren't in the content, the PDFs should be eliminated from the results.

b)An additional step in relation to (a) would be I do want "Environmental movement" AND "Greenpeace", but I don't want (NOT) if there is the sentence "World Wide Fund for Nature".

In short, (a) + (b) would sound as:

=> PDFs which has "Environmental movement" in the content and "Greenpeace" anywhere but NOT "World Wide Fund for Nature".

Many thanks for advices!
Cadu
  • you don't even need regex:

    Match all:

    attachment contains: "Environmental movement"
    attachment contains: "Greenpeace"
    attachment does not contain: "World Wide Fund for Nature"
  • Sure. You are right!

    Just for my technical information regarding Zotero's performance searching PDF content in large data base (5.000 PDFs), what tends to provide better performance overall:

    a) Zotero's normal PDF content search
    Match any:
    attachment contains: word1
    attachment contains: word2
    attachment contains: wordn (plus 20 words being searched)

    b) Zotero's Regex PDF content search
    attachment contains: word1|word2|wordn (plus 20 words using | )

    Many thanks,
    Cadu
  • I'm pretty sure separate search conditions without regex are going to perform better.
  • Yes, most likely. The former just needs to use the database. The latter has to scan the full-text content of all attachment files.
Sign In or Register to comment.