rebuilding pdf index

Peter100 · May 4, 2010

I am trying to rebuild my index but of approx 500 pdfs I am unable to get Zotero to index a remaining 60 and 35 partial. I have tried increasing the maximum character count and maximum pages without success.

I have received a chrome script error and followed the advice here:

http://www.zotero.org/support/kb/unresponsive_script_warning

and discussion here:

http://forums.zotero.org/discussion/11783/another-script-error-and-instability-of-firefox-with-zotero-202/

... without success.

I have Report ID 1866499065

I am running the latest Zotero and Firefox updates on a fast Win XP machine.

dstillman · May 4, 2010

From the error report:

PDFs with filenames containing extended characters cannot currently be indexed due to a Firefox limitation

Peter100 · May 4, 2010

Hi Dan,

Will you please suggest the easiest way to find and replace these file names?

Also, will you please offer some examples of 'extended characters'?

dstillman · May 4, 2010

Look at the error report in Report Errors for the filenames.

Or wait. The underlying bug will be fixed in Firefox 3.7, I believe.

Peter100 · May 4, 2010

Is there any idea to try to install Firefox 3.7 alpha?

I generated another Report ID 1933833608 and had a look at the files. Only one shows the extended characters error (the danish "ø").

Meanwhile all show this: {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}.

Another curiosity is that there remain 60 unindexed files but only 5 in the error report???

Here are the five files showing up in the error report:

[JavaScript Error: "Green 2008 Capturing User Requirements (thesis).pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

[JavaScript Error: "Høybye, Johansen, T-Thomsen 2005 Online Interaction.pdf was not indexed -- PDFs with filenames containing extended characters cannot currently be indexed due to a Firefox limitation" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

[JavaScript Error: "Tegeland 2007 Information om kundval.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

[JavaScript Error: "99024537.pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

[JavaScript Error: "Johansson 2008 Older people's home modification process (thesis).pdf was not indexed" {file: "chrome://zotero/content/xpcom/fulltext.js" line: 476}]

dstillman · May 4, 2010

Do those files have embedded text?

Peter100 · May 4, 2010

I'm not entirely clear what you mean by "embedded text" but I checked two of the five PDFs and they appear to be in PDF SECURED format (text does not copy when selected which in effect is like a non-ocr / image pdf). This would explain why they couldn't be indexed. (I've tried to find ways to "unsecure" the pdfs but short of printing and rescanning I have not found a good solution.) It is possible that the other 55 are either in "secured" format and/or scanned image pdfs.

Question: Any ideas for how to located the other 55 pdfs that are not yet indexed since they don't show up in the error report?