How to type PDF files

max1836 · May 15, 2013

Is there some way with Zotero, or some other tool, which will automagically create a "type" list for PDF files?

I've been collecting PDF files for a while. Many were just images in a pdf filetype wrapper. Now some of those have become "full text" articles where you can search text, highlite and so on. I would like to replace some of those old files if possible.

My thought was first just to brute force it by manually opening each and tagging with a type.(I have about 1500 so far..). But I'm wondering f there is a more cleaver solution to this problem.

adamsmith · May 15, 2013

you mean something like this?
http://www.zotero.org/support/retrieve_pdf_metadata

max1836 · May 15, 2013

Thanks for answering, but I knew about that nifty part of Zotero.

Look at pdf file you can download from this webpage:
http://projecteuclid.org/euclid.aoms/1177731283
JSTOR: links.jstor.org

The pages inside the pdf file are just images. With acrobat reader you can't swipe text to cut and past. You can't highlite parts of the article.

I'm looking for a way to tell if the article inside the pdf file is "full text" or just "images."

----

As I said earlier I know that I can spin through all the pdf files and tag them manually. But I'd like to find the 10% that are just images without opening them all.

adamsmith · May 15, 2013

not as easy as it probably should be, but you can look at this post:
https://forums.zotero.org/discussion/6019/finding-unindexed-items/#Item_4

max1836 · May 16, 2013

Thanks again for a good tip. I thought maybe I could just spin through the list of PDFs and look to see if the file was indexed. That works but only to a small extent.

!@#$%^ Jstor put a copy page on a lot of the "old" pdf files that I have. Since "something" was indexed, the the pdf file shows as indexed, even though the "real article" within the pdf file is just images.

I probably only have about 150 papers that I really care about so I'll spin through those manually.

I did notice that when retrieving pdf metadata that Zotero does detect files with no OCR text all. It would be nice if this was flagged on item record somehow. This probably the road of good intentions which leads to total chaos. Some pdf files are password locked. So I'm not sure how many categories of pdf files you'd need, nor how a program would detect them all.