Indexing
I'm running latest version of Zotero on a Mac with pdf to text and pdfinfo 3.02 both installed.
I can't get it to index my pdfs correctly. If I pull a .pdf into Zotero and ask it to a "Retrieve metadata from PDF" ,it always gives an error, "Could not read data from PDF." I have 56 indexed, 1156 unindexed. Most files are unindexed.
I searched the forums and I'm unclear where to even start. I originally had all my Zotero files in my .mac iDisk directory; I moved them to the desktop and there was no improvement.
Clearly this limits the utility of Zotero for me and I'd love to have this functionality. Any ideas?
I can't get it to index my pdfs correctly. If I pull a .pdf into Zotero and ask it to a "Retrieve metadata from PDF" ,it always gives an error, "Could not read data from PDF." I have 56 indexed, 1156 unindexed. Most files are unindexed.
I searched the forums and I'm unclear where to even start. I originally had all my Zotero files in my .mac iDisk directory; I moved them to the desktop and there was no improvement.
Clearly this limits the utility of Zotero for me and I'd love to have this functionality. Any ideas?
This is an old discussion that has not been active in a long time. Instead of commenting here, you should start a new discussion. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
The metadata retrieval for PDFs does not work regardless of the type of article- I tried about ten different ones to ensure it wasn't a specific kind of pdf (from a specific journal for instance) that causes the problem.
Then, try rebuilding your index once more and see if you can report an error afterwards - maybe the process gets stuck at one specific place and just stops then. Post the error ID and maybe the error text that seems relevant here - maybe that may even help you make sense of the problem.
Here is the Debug File for when I try to retrieve pdf metadata- D118947533.
Here is the Debug File for my attempt at indexing all my files: D2000249714.
Help appreciated!
So far I have identified 4 different reasons why Zotero doesn't index a pdf even after repeated attempts (via the re-index-option in prefs or manually with the context menu's re-index point). Obvious one first: a) no text layer in pdf; but also b) pdf is password-encrypted; c) title of pdf on import contains spaces; and, tricky for anything non-english d) accents & umlauts in filenames (haven't figured out the complete list of no-nos but it definitely doesn't take to any specifically French, German or latinised Indic characters in the filename). After removing offending spaces and non-english characters Zotero indexes the file on a reimport pretty much ok. (Have one or two files that remain adamantely unindexable despite the above circumspections).
Obviously, it would be great if non-english characters could be handled by Zotero. My current solution is that I trim anything potentially alien from the filename before importing, scrape the data for the main item (or fill in the info panel by hand) then use the 'rename file from metadata' option on the now-indexed pdf – and the attachment's name is reinstated in its previously unacceptable glory, all bristling with spaces and accents.
kithairon
https://www.zotero.org/trac/ticket/957
Did the debug logs provide any useful information?
Also, since developers are quite busy, try following my initial suggestion and, instead of creating debug output, see if you can create an error report - error reports are pretty easy to read and you might be able to figure out what's going on yourself.
You could also have a look at the debug output - if Zotero crashes while indexing a specific file that should be pretty easy to discern.
Btw - for not indexed pdfs - do you have green arrows next to the Indexed:no in item information? What happens when you press the green arrows.