How to type PDF files

Is there some way with Zotero, or some other tool, which will automagically create a "type" list for PDF files?

I've been collecting PDF files for a while. Many were just images in a pdf filetype wrapper. Now some of those have become "full text" articles where you can search text, highlite and so on. I would like to replace some of those old files if possible.

My thought was first just to brute force it by manually opening each and tagging with a type.(I have about 1500 so far..). But I'm wondering f there is a more cleaver solution to this problem.
  • you mean something like this?
    http://www.zotero.org/support/retrieve_pdf_metadata
  • Thanks for answering, but I knew about that nifty part of Zotero.

    Look at pdf file you can download from this webpage:
    http://projecteuclid.org/euclid.aoms/1177731283
    JSTOR: links.jstor.org

    The pages inside the pdf file are just images. With acrobat reader you can't swipe text to cut and past. You can't highlite parts of the article.

    I'm looking for a way to tell if the article inside the pdf file is "full text" or just "images."

    ----

    As I said earlier I know that I can spin through all the pdf files and tag them manually. But I'd like to find the 10% that are just images without opening them all.
  • not as easy as it probably should be, but you can look at this post:
    https://forums.zotero.org/discussion/6019/finding-unindexed-items/#Item_4
  • Thanks again for a good tip. I thought maybe I could just spin through the list of PDFs and look to see if the file was indexed. That works but only to a small extent.

    !@#$%^ Jstor put a copy page on a lot of the "old" pdf files that I have. Since "something" was indexed, the the pdf file shows as indexed, even though the "real article" within the pdf file is just images.

    I probably only have about 150 papers that I really care about so I'll spin through those manually.

    I did notice that when retrieving pdf metadata that Zotero does detect files with no OCR text all. It would be nice if this was flagged on item record somehow. This probably the road of good intentions which leads to total chaos. Some pdf files are password locked. So I'm not sure how many categories of pdf files you'd need, nor how a program would detect them all.

This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.

Sign In or Register to comment.