Search multiple PDFs

I have a library of 500+ research articles in PDF (about 700-800 MB of my HD). They are all searchable PDFs (I have OCRed the ones which were scanned pages with noi text). The articles are of different sizes, but about 50 pages each.

I've put these PDFs all together in one folder and now I am looking for a search engine (for Windows 7) which is able to perform full-text searches in this whole library of PDF files. I've tried several pieces of software, but no one has given me a satiisfactory experience. Let me tell you which ones I've used:

Windows Search: fast and indexes files, but not very straightforward to limit the searches to a specific folder.

Google Desktop: fast and indexes files, but I have not found a way to linit the searches to PDFs inside a specific folder (I don't want it to search the thousands of PDFs stored in my HD). Plus, it has been discontinued by Google.

Copernic: fast, indexes files and I can limit the searches to a specific folder. However, it is not able to render properly the text inside the PDFs.

Mendeley: it creates a database of PDFs, indexes and searches those PDFs included in the database. However, it has crashed due to the large number of PDF files I've added. In addition, it cannot display all the instances of a specific word I search for.

Zotero: I couldn't even try it, it crashed as I tried to add my PDFs to its database.

Adobe Reader: it searches all PDF files inside a specific folder. However, the search is very slow (it does not index files). It is able to show all the instances a word is found in each PDF file, renders PDFs greatly and it is possible to read and annotate the PDFs right after the search. But it is sooooo slow.

PDF X-Change Viewer: prety much the same as Adobe Reader.

Foxit Reader: the best so far. Just like Adobe Reader and PDF X-Change Viewer, but the searches are a bit faster. In addition, I liked the interface better.

The ideal solution for me would be if Foxit Reader could index all PDFs inside a specific folder, so searches would be much faster. Is it possible? Is there a solution which I have not yet tried?
  • Zotero will construct a full text index of N first pages of the documents. N is maybe around 10. Someone who knows better than I can give you a more accurate answer.

    If crashing is a problem for you, you should import the PDFs in smaller batches. Maybe 50 at a time.

    If you are willing to spend some time and at the same time help Zotero get better, you can troubleshoot the crash here at the forums with the developers. Just try to explain the situation that causes the crash in as much detail as possible and you will likely get help here.

    That being said, for finding a specific PDF on my computer, I use Spotlight on Mac. If I want to restrict to a particular folder, I use the Zotero full text search. (My long term wish would be to combine these both into a single function) Both are fast and I have not experienced crashes either with around 7000 PDFs stored in my database.
  • as for the "N" mronkko mentions - you can set the number of pages and characters per document indexed by Zotero in the search tab of the Zotero preferences - I believe the default is 50,000 and 100.
  • I've managed to add all my PDFs to the database in Zotero. And I've also managed to index all PDF files so I can search the content of PDFs easily. However, I could not find a way for Zotero to actually show the results of the search. I would like to have all the search results highlighted just like in Foxit Reader, and I couldn't find a way for Zotero (nor Mendeley either) to do that. Without this function, the PDF search for me is nearly useless...
  • Zotero does not show the actual PDF text that matches. I wish that it did, but this would be very difficult to implement in a way that would work on Mac, Linux, and Windows.
  • Isn't Quiqqa designed to do something like this? It's Windows only, which would be a no go for me even if I did use windows, but if you don't care about that it may well be worth a look.
  • Quiqqa does that. The search is fast, but the software is very slow and sluggish. The PDFs are very slow to open. I've also found a way of doing it with Mendeley, but it is also very slow.

    As I'm using a laptop equipped with a Core i7-2720M with 8 GB of RAM, I don't think it will get much faster than that.

    The only software which I have found so far that has satisfactorily done this kind of search is dtSearch, but it is a buggy piece of software and it costs US$ 199...
  • OK - the point is Zotero isn't the solution to this particular issue and will most likely not be the best solution for at least a couple of years given the state of open-source pdf tools. So you should likely take the question somewhere else.

    If pdf handling is that important to you, Windows is likely the wrong OS for you - the pdf handling on Macs is about 5 years ahead of anything windows (or linux).
  • adamsmith, I guess you are right. My previous computer was a Mac (a white polycarbonate MacBook), and I remember that there were some good pieces of software able to manage PDFs. PDF handling was not much of an issue back then, but then it became more and more important as I realized I was just taking too much time searching for specific text inside a huge amount of PDF files. Then, I OCRed all the PDFs I had and I'm trying to find a good piece of software to handle them. Unfortunately, I can't find good options for Windows, despite it being far more used than Mac OS. Which ones for Mac would do this task? Sente? Papers? DevonThink?
  • I think papers for Mac is the state of the art for pdf management, especially for academic purposes but I'm not sure.
    I thought spotlight also did quite a good job in combination with preview

    I actually really don't like Macs, so my knowledge about this consist mainly of looking at other people's screens and being jealous.
  • On Mac OS X 10.6 spotlight is a really good way to find PDFs, but it will just show you which files match. Then when you open them in preview, the matches are highlighted. But you still need to open one file at a time.

    I have not tested Papers much.
  • I've tested Papers and it seems to be fine. However, althought it shows the highlighted matches, it does not have a feature (such as Foxit or PDF X-Change or Adobe Reader has) that allows to navigate throught all the matches in each document.

    I've found DevonThink for Mac to be a better choice in this regard. It is fast to search, it shows the results by relevance and it is also possible to navigate throught the highlighted results (I've dne it with Skim).

    Still, I would rather have something workable for Windows.
  • "Windows Search: fast and indexes files, but not very straightforward to limit the searches to a specific folder."

    If you open a folder in Explorer, any search you do in the upper-right-hand box has results limited to that folder. You can see in the search box there is a greyed-out "Search <foldername>".
  • Windows 7 inbuilt search is very powerful. Make sure you have the iFilter plugin installed in Windows. iFilter is provided by Adobe and with Acrobat 10 (or 11) onwards is autoinstalled for 32-bit Windows. However, for 64-bit Windows, you may have to install it manually. And as quickfold11 said, use Explorer to reach the folder and then search using the upper-right box. Further, Windows 7 onwards the inbuilt search uses Advanced Query Syntax. You can find many online help pages for this function.

    http://blog.techhit.com/55696-indexing-and-searching-pdf-content-using-windows-search

    Use the latest iFilter versions from Adobe's website.
Sign In or Register to comment.