identifying nonindexed pdfs

xanthogenate · December 28, 2015

Hello,

in the past, I used different firefox profiles, yielding different and independent zotero profiles. Working in a heterogeneous IT environment (Microsoft Windows, Debian Linux, Mac) contributes to the reason why this happened. Now while I learned how to curate these profiles and to merge them into a single one, I equally recognized some (well, about 300) of the references contain pdf files that are not indexed / may not be indexed by zotero. As recurrent observation I notice many patents downloaded from esp@cenet / The European Patent Office do not contain a text-layer, while either me or colleagues of mine already created a "parent item"

Of course, similar to the journal articles, I would like to use zotero both to track bibliographic data, as well to make the body of these texts accessible to the index machine. However, I ponder if there is an easy applicable filter function to get these patents (spread in several sub-collections) back to surface to subject them to an OCR?

adamsmith · December 29, 2015

Advanced search:

Attachment Content -- does not contain -- a
Attachment File Type -- is -- PDF

-->Create saved search.

For the first condition, it should work to use % as a wildcard instead of a (which I've just put in as a random common letter, you can use anything else) or to set it to use regex and use ., but I couldn't get either to work.

identifying *non*indexed pdfs

identifying nonindexed pdfs