PDF questions from newbie

Hi,

I've just started using Zotero. I'm feeling my way around and have some questions, this time about copying and indexing PDFs.

Apologies for having to ask about PDFs, but I searched the forum for answers but couldn't find any.

{SUGGESTION FOR FORUM SEARCHING (apologies if I'm not seeing it). Add an advanced search function. My searches brought up a lot of results that were of little relevance.]

1) does Zotero (I have the latest version) include PDFTOTXT (I don't see it listed among the plugins at https://www.zotero.org/support/plugins)? Do I need to download it at https://www.xpdfreader.com/about.html? I assume PDFTOTXT is part of xpdfreader. How would I link PDFTOTXT to Zotero?

Xpdfreader describes PDFTOTXT as a 'command line' tool. What does this mean? Do I have to run PDFTOTXT from the DOS command line? This sounds antiquated, not to mention cumbersome, but am I right? How would I run/activate PDFTOTXT in Zotero. Are there any guides?

Does anyone use other PDF conversion software as opposed to PDFTOTXT? I currently use Perfect PDF Editor 9 to process, edit, etc my PDF files.

2) For image PDFs there are various Google OCR apps of varying degrees of usefulness and accuracy as well as Windows text grabbers. I'm still experimenting with these. At most they are probably only useful for no more than snippets or paragraphs.

For PDF image files I have OmniPage and Abby Fine Reader. I've not used them very much, but they seem to work well, especially if the PDF lacks images. The strategy would be to convert the PDF image file to a regular PDF file and then import it into Zotero. Will this work? I assume it should.

Finally, are notes and other text included in Zotero (e.g. Microsoft Word documents, newspaper articles) automatically indexed? Or od you have to rely on tags? If they are not do any users find it useful to convert them to PDF files before adding them to Zotero? This way they would be indexed. Your thoughts?

In particular, newspapers often prevent you from copying their content except as an image. You could get around this by using OCR software to convert the image to a PDF file. Does anyone do this? Does it sound feasible and useful? Would it be too time consuming?

My thanks in advance for your comments and suggestions.
  • 1) pdftotext is included and you don't have to do anything for it to run
    2) Yes, running OCR before adding a file to Zotero will work
    3) HTML and text files are indexed, Word files are not. If you have a lot of Word files, yes, it may be worth converting them to PDF before adding them. Zotero stores snapshots of webpages you save and that will include the full text of most (though not all) newspapers.
    Where that doesn't work, whether you want to save them as image files and then OCR them depends on how badly you want the full text search I suppose. I certainly would find it way to cumbersome.
  • Thanks for your comments. I very much appreciate them.
Sign In or Register to comment.