identifying *non*indexed pdfs
Hello,
in the past, I used different firefox profiles, yielding different and independent zotero profiles. Working in a heterogeneous IT environment (Microsoft Windows, Debian Linux, Mac) contributes to the reason why this happened. Now while I learned how to curate these profiles and to merge them into a single one, I equally recognized some (well, about 300) of the references contain pdf files that are not indexed / may not be indexed by zotero. As recurrent observation I notice many patents downloaded from esp@cenet / The European Patent Office do not contain a text-layer, while either me or colleagues of mine already created a "parent item"
Of course, similar to the journal articles, I would like to use zotero both to track bibliographic data, as well to make the body of these texts accessible to the index machine. However, I ponder if there is an easy applicable filter function to get these patents (spread in several sub-collections) back to surface to subject them to an OCR?
in the past, I used different firefox profiles, yielding different and independent zotero profiles. Working in a heterogeneous IT environment (Microsoft Windows, Debian Linux, Mac) contributes to the reason why this happened. Now while I learned how to curate these profiles and to merge them into a single one, I equally recognized some (well, about 300) of the references contain pdf files that are not indexed / may not be indexed by zotero. As recurrent observation I notice many patents downloaded from esp@cenet / The European Patent Office do not contain a text-layer, while either me or colleagues of mine already created a "parent item"
Of course, similar to the journal articles, I would like to use zotero both to track bibliographic data, as well to make the body of these texts accessible to the index machine. However, I ponder if there is an easy applicable filter function to get these patents (spread in several sub-collections) back to surface to subject them to an OCR?
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
Attachment Content -- does not contain -- a
Attachment File Type -- is -- PDF
-->Create saved search.
For the first condition, it should work to use % as a wildcard instead of a (which I've just put in as a random common letter, you can use anything else) or to set it to use regex and use ., but I couldn't get either to work.