how to phrase-search in PDF attachments?
hello,
I have most of articles in the form of PDF files (indexed). Simple search works fine if I look for a single word or multiple words, but if I use quotation marks (" ") Zotero finds only a few out of expected ca 50 (apparently it finds only those containing the phrase in a title and saved HTML attachments).
If I use advanced search through attachment content (phrase, incl. binary files or not, it does not make any difference), it only brings me saved web pages and no PDF articles. So how can I make it search PDFs for a phrase?
v.
I have most of articles in the form of PDF files (indexed). Simple search works fine if I look for a single word or multiple words, but if I use quotation marks (" ") Zotero finds only a few out of expected ca 50 (apparently it finds only those containing the phrase in a title and saved HTML attachments).
If I use advanced search through attachment content (phrase, incl. binary files or not, it does not make any difference), it only brings me saved web pages and no PDF articles. So how can I make it search PDFs for a phrase?
v.
Could you please answer my question? I really do not know whether there is no answer because the solution is so simple, or because it is not possible to phrase-search in PDFs.
'"hot dog"' (with the quotes) will perform a phrase-search through PDFs as well, but it may not lead to successful results due to the limitations of pdf->text conversion that I mentioned.
This will confirm that such searches are possible. But, again,
Vinthund - which version of Zotero are you using?
If it fixes your problem, great, if not you will have at least got a host of other bug fixes, which should help prevent other problems cropping up.
Be aware that the note and attachment tabs have disappeared from the latest version (the notes tab will return at some point, but not the attachments tab) so if you are particularly attached to the notes tab you may want to consider delaying the upgrade. There are of course alternative ways (a couple of buttons in the toolbar) to deal with notes and attachments in the latest version.
Another thing to check is that your PDFs are properly indexed. Does "Indexed: Yes" appear in the info pane for PDFs? If not, you will need to check that the "pdftotext" and "pdfinfo" tools are installed in the search pane of Zotero preferences, and install them if not. You can then index your PDFs by clicking "rebuild index" in the same pane. If your PDFs are not indexed that would explain why you are only finding words which appear in titles, rather than inside the PDFs.
But that would be too simple an explanation ;)
Yes, they are indexed, and I hope they are properly indexed. At least "Indexed: Yes" does appear, and I am able to look for single words in attached files. It is useful anyway, but if I could look for phrases, that would save me some effort.
I wanted to recommend Zotero to some of my colleagues, but I shall wait until I can explain to them, how the phrase-search works. Or could this problem be software-dependent, somaybe it would work fine on their Windows? I run Ubuntu and FF 3.5.3, if that matters.
http://finaid.georgetown.edu/sample.pdf
Using the quick search bar, the following should all lead to that PDF being found:
- Test
- Test PDF
- Test your
- "Test"
- "Test PDF"
And this should not lead to it being found:- "Test your"
.Using advanced search, all searches are phrase searches & you should not use quotation marks. So, an advanced search with the single criterion being that attachment content contains either:
- Test
- Test PDF
will lead to a match. Searches for:- Test your
- "Test"
- "Test PDF"
- "Test your"
will not lead to matchesAfter that I rebuilt the index and now it seems to work much better - for instance I get 100 hits whereas previously I got 9 using exactly the same phrase :) and I hope they are accurate.
Thanks everyone.