rebuilding pdf index

I am trying to rebuild my index but of approx 500 pdfs I am unable to get Zotero to index a remaining 60 and 35 partial. I have tried increasing the maximum character count and maximum pages without success.

I have received a chrome script error and followed the advice here:

http://www.zotero.org/support/kb/unresponsive_script_warning

and discussion here:

http://forums.zotero.org/discussion/11783/another-script-error-and-instability-of-firefox-with-zotero-202/

... without success.

I have Report ID 1866499065

I am running the latest Zotero and Firefox updates on a fast Win XP machine.
  • From the error report:
    PDFs with filenames containing extended characters cannot currently be indexed due to a Firefox limitation
  • edited May 4, 2010
    Hi Dan,

    Will you please suggest the easiest way to find and replace these file names?

    Also, will you please offer some examples of 'extended characters'?
  • Look at the error report in Report Errors for the filenames.

    Or wait. The underlying bug will be fixed in Firefox 3.7, I believe.
  • edited May 4, 2010
    Is there any idea to try to install Firefox 3.7 alpha?

    I generated another Report ID 1933833608 and had a look at the files. Only one shows the extended characters error (the danish "ø").

    Meanwhile all show this: {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}.

    Another curiosity is that there remain 60 unindexed files but only 5 in the error report???

    Here are the five files showing up in the error report:

    [JavaScript Error: "Green 2008 Capturing User Requirements (thesis).pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

    [JavaScript Error: "Høybye, Johansen, T-Thomsen 2005 Online Interaction.pdf was not indexed -- PDFs with filenames containing extended characters cannot currently be indexed due to a Firefox limitation" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

    [JavaScript Error: "Tegeland 2007 Information om kundval.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

    [JavaScript Error: "99024537.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

    [JavaScript Error: "Johansson 2008 Older people's home modification process (thesis).pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]
  • Do those files have embedded text?
  • edited May 4, 2010
    I'm not entirely clear what you mean by "embedded text" but I checked two of the five PDFs and they appear to be in PDF SECURED format (text does not copy when selected which in effect is like a non-ocr / image pdf). This would explain why they couldn't be indexed. (I've tried to find ways to "unsecure" the pdfs but short of printing and rescanning I have not found a good solution.) It is possible that the other 55 are either in "secured" format and/or scanned image pdfs.

    Question: Any ideas for how to located the other 55 pdfs that are not yet indexed since they don't show up in the error report?

This is an old discussion that has not been active in a long time. Instead of commenting here, you should start a new discussion. If you think the content of this discussion is still relevant, you can link to it from your new discussion.