Problem with indexing pdfs

Hello,

I cannot seem to indexing all my text PDFs. 1578997951
Is there a way to find out what the problem is?
«1
  • Furthermore, after importing a few hundred pdfs and entries from endnote I started to get this error:
    1352818398

    The search engine stops working and I need to restart firefox to make zotero work.
    thanks.
  • One more note: the same files that would not index automatically can be indexed manually, but as you know it's time consuming.
  • Are you using the retrieve metadata function?
    because it uses google scholar, google locks you down after a while because you look like a robot ;-).
  • No I haven't, this is what I did:
    - I imported my database with NO docs attached from Endnote;
    - I used OCR on my 400 pdf files to then be able to index them with the Zotero function;
    - I attached each file to an entry through a *link* using zotfile: this was the only way, to my knowledge, to rename in a standardize fashion my files and move them to a folder of my choice as opposed to zotero's numbered directories. I did this also to avoid conflicts with Dropbox - it took me some time but I was very pleased with the result.
    - I then tried to maximize the number of indexed files in a variety of ways: I tried to rebuild the index from zero; I tried to "index unindexed items"; I tried to clear the index and do it again... No matter what I do, Zotero does not seem capable of indexing more than 150 files. Provided that a portion of my 400 files may have not converted to text (10-20% tops), I still don't understand while the indexing process stops at around 150 items instead of continuing until at least 300 files have been indexed.

    Thanks.
  • hmm - Zotero certainly isn't limited to indexing 150 files - I have almost 1000 indexed.
  • Hallo,
    I have a likewise problem, but from the beginning, not after a import.
    No pdfs (probably) are indexed - i fear, the 82 indexed entries are all websites.
    I installed the pdftotxt files in the search options dialog, i tried to index single files containing text.
    Supposing an XP problem i copied my data to a linux and tried it there, same effect :-(((
    Its dont matter if i created them myself or downloaded anywhere.
    Using pdftotxt on a linux i can get the text, so it seems, my pdfs are ok.
  • Some of mine are indexed, but just a few. On top of this problem, I noticed that when I try to manually index those pdfs that are not indexed, my zotero crashes. Here is the error:
    425236630

    This is becoming quite frustrating actually, I hope someone will help.
  • salvadore - what OS are you on?
    if you're on Linux or Mac, could you try to run pdftotxt on one of the files that crashes zotero? (I have no idea how to do that - or if it's even possible - on Windows, but if it is - the same).

    @reh - what happens if you manually try to index one of the files in question (select the file and click on the green arrow-circle next to indexed: no
    see if you get an error message that you could post.
    In the search tab of your preferences - do you have both pdftotext and pdfinfo shown as installed?

    Also, for both of you which Zotero version are you using?
  • salvadore - what OS are you on? same problem under both vista and xp

    if you're on Linux or Mac, could you try to run pdftotxt on one of the files that crashes zotero? (I have no idea how to do that - or if it's even possible - on Windows, but if it is - the same).
    Sorry, never used a Mac/Linux in my life. By the way, I am running the latest version of Zotero and I have both pdf software installed.



    Also, for both of you which Zotero version are you using?
  • edited January 26, 2010
    salvadore: Too many errors in there, most unrelated to Zotero. Restart Firefox and provide a Report ID and Debug ID for just the indexing attempt. Also, what do you mean by "crash"?
  • edited January 26, 2010
    First of all, thank you for your concern!
    By crash I mean that my zotero database stops responding and all the entries disappear.
    I tried to manually index a pdf linked to one of my entries and this is what happened:
    - the two green "recycle"-like arrows disappeared and the "indexed" category still showed "No".
    - I tried to go to another entry to see what would happened and the central window displayed the message "an error has occured. Please restart Firefox.....
    This time the "report error" option was actually grey and I could not report the error. I closed Zotero without closing firefox and once I tried to reopen zotero I received an error message.

    [removed non-Zotero error — D.S.]
  • If Report Errors is grayed out then there's no error, but we still need a Debug ID.
  • For some reason the error language was cut from the posting - sorry! Here is a new error for you, following the same actions as described above:
    [JavaScript Error: "uncaught exception: [Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFile.moveTo]" nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)" location: "JS frame :: chrome://zotero/content/xpcom/attachments.js :: _moveOrphanedDirectory :: line 1230" data: no]"]

    648032580
  • Again, we need a Debug ID. Please follow the link.
  • Sorry, here it is: The Debug ID is D286758233
  • Here is one more case:
    The Debug ID is D904080953.
  • > @reh - what happens if you manually try to index one of the files in question
    the arrow shortly disappears - nothing else happens

    > do you have both pdftotext and pdfinfo shown as installed?
    yes, in both OS

    > which Zotero version are you using?
    the last beta, today i installed the 2.0rc2, same problem
    Firefox is 3.0.3 on XP and 3.6 on Linux

    > see if you get an error message that you could post.
    (XP) seems like this is the problem: Cache file doesn't exist!
    The Debug ID is D1719467896.

    Im not sure, how the rights was on the virtual ubuntu, but here on XP there are no write restrictions.
  • You both are receiving the "Cache file doesn't exist" message, and you both are trying to index files on drives other than C:, which I'm guessing is the issue. Can you provide any details on those drives?

    This obviously shouldn't take down Zotero in any case, though, so we'll take a look.
  • salvadore: Your Zotero crash was due to a bug that I've just fixed in the latest dev build. Your indexing failure, at least with the file you provided debug output for, was due to a Firefox limitation that prevents Zotero from indexing files with filenames containing extended characters. I've added some error logging for this to the latest dev build. The good news is that it looks like this will soon be fixed in the Mozilla codebase, though the fix probably won't be available until Firefox 3.7.

    reh: I'm not sure why you're getting the indexing failure, assuming that PDF does indeed have embedded text, but the non-C: drive would be my best guess. You might be able to learn more by running pdftotext from the command line using the same arguments that are shown in the debug output.
  • Its a NTFS partition on my 2nd HD.
    I set the storage preferences to the default, install the pdf-tools:
    Error running pdftotext
    The Debug ID is D69316096.

    On linux my home is a mounted NFS devise (XFS).
    I set the storage preferences to the default, zotero to 777.
    Same effect as yesterday: no cache file.
  • Well, like I said, you'd have to run pdftotext from the command line to have any hope of figuring out what the actual error is.
  • i read your post not before posting mine

    I made a cd to the zotero dir and tried to use pdftotext-Linux-i686 [string from log] on a commandline (hopefully correct): command not found.
    The same with pdftotext works and created the cache file.

    Running pdftotext-Win32.exe [string from log]: program could not be executed (german text).
  • I made a cd to the zotero dir and tried to use pdftotext-Linux-i686 [string from log] on a commandline (hopefully correct): command not found.
    Try "./pdftotext-Linux-i686" instead.
  • bash: file or directory not found (german)
  • reh
    edited January 27, 2010
    if i try to execute it without arguments i get: cant execute binary file (x option is correct set), if i click it in MC it was displayed like called with more.

    on XP i get: program to bit for RAM (working memory)
  • Well, you're pretty much on your own for these. They're just executable files, and they work on both Linux and XP. If you're having trouble, you can try erasing the binaries, restarting Firefox, and reinstalling them. If they don't work, there's some other problem on your system.
  • Someone in an Uni web installed zotero and gave me the pdf bins from this system (very different size). With this the indexing works on linux.

    But normally i work with XP.
    Unfortunately i cant find this pdf tools for manual download.
    (but i found another thread with the same problem: http://forums.zotero.org/discussion/7681/pdfinfo-pdftotext-crash-program-too-big-to-fit-in-memory/)
    In another post i found a link to http://www.zotero.org/download/xpdf/pdfinfo-Linux-i686-3.02
    but http://www.zotero.org/download/xpdf/ is not allowed.

    The autoinstall is a good thing, but i think, it should be able to manually download the proper version, if needed. Please link it anywhere.

    For the developer:
    Our situation here is a PC with 64 bit linux running several virtual 32 bit machines (vmware).
    And a separate PC (athlon dual core 4850e) with a 32 bit XP.
  • Please, can someone tell me, where i manually can download the pdftotext-Win32.exe?

    I already installed it several times, also with disabled cache, but it dont work - seems that the problem is not on my pc, but with the auto downloading.
    Maybe there should be any form of checking in Zotero (checksum?), if the download was correct .
  • http://www.zotero.org/download/xpdf/pdftotext-Win32.exe-3.02

    Checksumming is planned, but, of course, that would only indicate a failure in your case, not fix it. The auto-download works for most people, so it's likely an issue either with your computer or a network glitch (or Firefox still had the corrupted version cached).
  • If you're downloading manually, you need to remove the "-3.02" suffix, and you should create a pdftotext-Win32.exe.version file that contains "3.02". (The same applies to the Linux version.)
Sign In or Register to comment.