PDF Indexing error (zotero standalone and zotero add-on)

Hello, I searched on the forum and found some posts menitioning a problem when indexing PDF files, but these post doesn't give a real solution, so I post it again in case there are any new solution.

I installed, via the zotero preferences, the the pdftotext and pdfinfo plugins (version 3.02) but I still get an error "parameter is incorrect" when trying to index PDF files newly added to the library; after the error the get metadata feauture hangs on, although it does not become irresponsive.

I Hope someone can help.

previous post: http://forums.zotero.org/discussion/11164/
«1
  • so you tried what Dan suggested in that thread and deleted & reinstalled the pdfinfo & pdftotext?
  • yes I did, but it was not really necessary as I had installed them today; indeed I had already installed these plugins some time ago, but always got the same problem, so I had deleted them;

    now I found in the need of indexing a collection af PDF documents; so I decided to delete my previous zotero installations, and start from the scratch with a bran new installation; but again these pdf tools do not work; it is maybe because I3m on a protected environment (i.e: I don't have administrative rights on this machine, but that didn't prevent me from installing zotero, nor I had any message that something went wrong in the installation)..
  • What version of Windows?
  • Windows XP professional Version 2002 Service Pack 3
    At home I have a mac, and pdf indexing seems to run smoothly.
  • it is maybe because I3m on a protected environment (i.e: I don't have administrative rights on this machine, but that didn't prevent me from installing zotero, nor I had any message that something went wrong in the installation)
    Can you run arbitrary executables? Zotero isn't an executable. The PDF tools are.
  • Yes I can, at least some; e.g. I can't install programs which have a windows installer or similar and try to install shared ddl, etc.. but for small utilities and programs this is not forbidden, or at least or security cannot intercept them. Anyhow Zotero stand alone is and executable. That's why I'm even more puzzled, as I get the same problems with pdfs even with SA version; so it does not depend on the browser.
  • It's possible that executables with the flag that makes them hidden (which is the difference between the original PDF tools and Zotero's versions) aren't allowed on your system.

    If you don't mind a console window popping up, you can grab the original versions, rename them to match what's in the Zotero data directory, and replace Zotero's versions.
  • Thanks Dan, I downloaded the original version, installed in the zotero data directory , and now it works. I suppose there was some problems in downloading them from the zotero preferences; as I can see now that the size of the zotero version is far smaller than the original; so maybe our security didn't let zotero to download properly, while it let to download from the original site (which is even worst from a security point of view...) anyhow it works now so thanks again
  • OK, great. Just note that with the original versions you'll (I believe) briefly see a console window pop up every time Zotero indexes a PDF.
  • I saw, thanks. Maybe you could think to make zotero's versions available for download separetely, I mean not through the preferences pane.

    best regards
  • http://www.zotero.org/download/xpdf/pdfinfo-Win32.exe-3.02
    http://www.zotero.org/download/xpdf/pdftotext-Win32.exe-3.02

    (Also "MacIntel", "MacPPC", "Linux-i686", "Linux-x86_64")

    You'll need to rename appropriately.
  • Sorry If I reopen this issue, but indeed, I realize that although the Zotero, plug-ins are installed and functioning, Zotero does not seem to index or reindex any PDF content. To avoid any misunderstanding I specify that we are talking of normal text PDF (not images or other, which could not be indexed of course); is there any thing that I missed? I mean if I click on the indexing button I expect zotero to Index the content of the pdf file, am I right or I do have false expectations?

    thanks for any comment you could provide
  • yes, that should work - it actually should do it automatically.
  • well, unfortunately it doesn't - and, as far as i can see, it is not a problem of this particular installation - as I tried it also on my mac at home, and it doesn't work either - I tried with different kind of PDF contents (simple pdf created by myself from a word document - articles in PDF from the internet or from bibliographic databases...
  • so, when you click on the green arrows next to "Indexed: No" - what happens? Do this and submit an error report immediately afterwards, and post the ID here
    http://www.zotero.org/support/reporting_bugs
  • nothing happens clicking on the green arrows, the status remains non indexed.
    I submitted the report ID 456857738
  • There's an error that indexing isn't working, but it doesn't tell us why. You can try running the PDF tools manually from the command line to see if they work—Zotero's debug output shows the commands it runs—but that's not really something we can help you with.

    I'd recommend deleting the pdf* files, restarting Firefox, and trying to reinstall again from the Zotero preferences. If it's not working after that, it's a problem with your system, and you'll have to talk to your IT folks.
  • I am at home here so no IT folks, i'm the administrator of my mac; furthermore i realize that even for the pdf files for which zotero tells they are indexed i can't search their content; isn't it a bit weird, that i have the same problem with zotero installed in two different machines with two different OS? and both with the add-on and the standalone version?

    as to the pdf tools, they are those i dowloaded yesterday from the location you gave me.

    thanks anyhow
  • I wasn't recommending that you download them manually, and it's certainly not a supported configuration.
  • edited November 10, 2011
    If you're having trouble on the Mac at home, reinstall the PDF tools there using the official method and provide a Report ID and a Debug ID for an indexing attempt that doesn't work (and say exactly how it isn't working).
  • I will delete everything and reinstall from the scratch and let you know.
    best regards
  • I think mibigo is referring to this from you above:
    If you don't mind a console window popping up, you can grab the original versions, rename them to match what's in the Zotero data directory, and replace Zotero's versions.
  • Yes that was when I had discovered that it was the security at work which didn't allow me to install pdf tools from Zotero; but eventually Dan provided a location where to download Zotero's version, so i did it and replaced originals with zotero's ones. anyhow this is irrelevant now, as the problem I'm experiencing at home is with a normal version where pdf tools have been installed smoothly from the preferences.

    Might be that certain pdf files do not allow to index their content? I would be surprise as they are normal text pdf, for which the text search within the file is ok.
  • Might be that certain pdf files do not allow to index their content?
    Yes, some PDFs don't allow that. But you'd have to provide examples with download links for us to be able to test.
  • certainly possible that you're just trying protected pdfs.
    This one works, so use that for testing:
    http://www.bls.gov/news.release/pdf/cpi.pdf
    If that doesn't work, see Dan's requests for Debug and Report ID above.
  • For your info, I reinstalled everything from the scratch; now some pdf files seem to be indexed; but still i have problem with some others:
    I submitted another report bug with ID: 4202915 as the error message seems to be different. the pdf file that doesn't index can be downloaded from here:
    http://www.localenergy.org/pdfs/Document Library/Exxon Future of Oil and Gas.pdf
  • yes, that's a protected document - you can tell by selecting some text and trying to copy it - you'll notice that you can't (or you can check Document Properties --> Security)
    I don't know for sure which options pdftotext needs enabled to work, but usually most options are either enabled or not.
  • oh, I see, I'm trying different files now and get diverse responses, so maybe that is really indeed the reason although I'm quite surprised to have stumbled in so many protected files; I will check those I have at the library; thanks and sorry for having not thought to that.

    Thanks again
  • The older version of pdftotext that Zotero installs fails to extract text from that PDF, with the message "Copying of text from this document is not allowed."

    The version of pdftotext in the latest version of poppler seems to ignore that restriction, though, and symlinking in poppler's pdftotext installed with Homebrew does in fact work. You're on your own for that, though (since if you don't know how to do that installing Homebrew isn't a good idea).

    We'll look into bundling a newer version of pdftotext. And hopefully we'll just be switching to pdf.js in the not-too-distant future.
Sign In or Register to comment.