Can I index linked pdf?
Hi,
I've based my whole library on linked pdf (and not attached). From what I understood, search within a pdf text is possible with Zotero (whereas now I can only search through fields and tags).
Is there any way (ie via a plugin?) that Zotero can index and search through linked pdf?
Thanks for your help!
I've based my whole library on linked pdf (and not attached). From what I understood, search within a pdf text is possible with Zotero (whereas now I can only search through fields and tags).
Is there any way (ie via a plugin?) that Zotero can index and search through linked pdf?
Thanks for your help!
My library contains around 1447 entries, all but 206 are linked to a pdf.
I'm using the search function of Windows Explorer in the root folder of my library for a very specific term : it returns 17 files. I can check each of them individually in Adobe Reader to verify that the term is indeed contained.
However in Zotero (Firefox plugin) if I select My Library and input the same search term, nothing comes up.
I know that no search can be exhaustive since a lot of (old) pdf have really bad characters that can't be recognized as text, but obviously all those found by Windows Explorer are found without an issue, so it's not an OCR problem.
Zotero search prefs tells me that both pdftotext and pdfinfo are up-to-date (v3.02a), characters/pages values are set to default (FYI none of the 17 pdf found earlier in WE are over 100 pages). Stats show that 841 files are indexed, 1 partially and 723 non indexed (weird that it doesn't add up to 1447-206 but over 1500..?).
EDIT : I should add that this test yielded the same results before and after trying to rebuild the index (going with the "Index non-indexed files" option).
Indeed if I select "Everywhere", I now get 15 files. So there are only 2 files that rightfully show up with WE search but not with Zotero.
After checking those 2 files, I can't find any reason why they wouldn't show up in Zotero: the pdf seem ok in Adobe reader, I can select/copy text with no problem.
Is there any reason for this discrepancy? I know I'll miss occurrences on older files but I want to make sure that I find as much as I possibly can.
If not, are you able to manually index them (by clicking on the round arrow next to "No")?
Anything else perhaps unusual about the file, like read only protection or so? Those sometimes trip up the tools Zotero uses for indexing.
Why were they not indexed earlier? Should I try to rebuild the index from scratch?
Maybe Dan has an idea?
Does that mean that it's the number of image-based pdfs that Zotero can't OCR and badly rendered text?
However when developing the whole library, there were a number of items that were linked to a pdf but also to a "PubMed entry" (I think it's what gets created when you're adding a ref via the "magic wand" and PMID, which I did when I began Zotero). After deleting these links, the unindexed number is now down to 12, and I can account for half of it (.doc files). And I've decided I don't care enough about the other half to try and track them down, so it's all good for me now.
Thanks for your patience Adam!