Indexing PDFS

louisiana2010 · March 13, 2012

Hey Gang!

A long time user of Zotero add-on for Firefox I recently installed the standalone on a mac running Lion.
I had problems indexing around a 1000 pdfs that I had gathered previously with the firefox version.
I would drag some (10) pdfs to my Library and the wheel would turn endlessly causing firefox to crash.
Same is happening with the standalone version which crashes also.

I'd like to know if it is possible at all to simply index those files.
In theory all I have to do is click and drag, well no sir i doesn't work.

Also is there any way to import those pdfs only to index them and then get rid of them in Zotero as this is not where they should be stored.
What would be awesome would be to be able to selct documents in the Finder- right-click on them- retrieve Metadaa-index in Zotero
without having to drag them at all.
But in the meantime I'd just love to be able to index those pdfs.
Any help would be appreciated.
Thanks

adamsmith · March 13, 2012

Also is there any way to import those pdfs only to index them and then get rid of them in Zotero as this is not where they should be stored.

add them as a link. Use "Link to File" from the green plus menu.

The indexing issue - hard to say what's not working, it certainly should - when you say "crash" what exactly happens?

louisiana2010 · March 13, 2012

Thanks Adamsmith for getting back so fast, this always amazes me.
Anyway, what happens is this:
I select a reasonnable number of pdfs in my Finder (10)
Drag them to ma ZOtero Library
Get the wheel turning (mac thing to tell you something is going on)
Then after way too long a while I check in my Dock
Right click on Zotero
Get "Application not responding"
And have to force it to quit
I know that when things go well it only takes Zotero a minute to do this so there's definitely a problem.
Thanks

adamsmith · March 13, 2012

How about if you just drag in one pdf file? (for troubleshooting purposes - obviously you should be able to drag in 10-20 files w/o problems).
Beyond that, Dan would have to say what type of info they need to debug this.

louisiana2010 · March 13, 2012

I just tried with ""Link to file" same problem
If I drag just one file, it sometimes works. Maybe that's because some files can not be indexed, then the all process freezes...my 2cts

dstillman · March 13, 2012

You can generate real-time debug output to see if anything is happening. Save the Terminal output to a file, compress it (via control-click), and e-mail it to support@zotero.org with a link to this thread if you want us to look at it.

dstillman · March 13, 2012

As a first step, though, you might want to try deleting the pdfinfo*/pdftotext* files in the Zotero data directory and restarting Zotero/Firefox to confirm that it's the indexing that's the problem.

louisiana2010 · March 14, 2012

Right so I've added my files little by little by linking them. It takes forever, I have to quit the apps, relaunch but then the files are in so no biggie I guess apart from taking forever.
I did try to delete the two mentionned files, didn't change a thing. I will look into the debug output later.

Zotero can't get the metadata for an impressive number of files though, which makes sense in a way as some haven't been ocr'ed and are in image mode.
Yet, I need to index them.

Is there a way to select only those files that haven't been indexed or should I click on the ones only displaying the link icon which means they haven't been indexed?

Last, one of the tedious workaround I have found it to select the non indexed files, generate a report, and copy/paste the title in Worldcat in order to collect the information. It works in 90% of the cases but I'd be glad if there were another way (I have hundreds of such files...)

adamsmith · March 14, 2012

Is there a way to select only those files that haven't been indexed or should I click on the ones only displaying the link icon which means they haven't been indexed?

I'm not sure I follow here. No, there is no way to select only files that haven't been indexed. The link icon is entirely unrelated to whether files are indexed or not.
I'm confused when/why you're trying to do this.
I'm also confused by what you write above that - a file that hasn't been OCRd cannot be indexed, unless you mean something different by "indexing" than Zotero.

dstillman · March 14, 2012

Yeah, not at all clear to me what you're trying to do (or think you need to do) here.

dstillman · March 14, 2012

I think maybe by "indexed" you just mean "have a parent item with bibliographic information", vs. just a linked file? As in, you're trying to manually replicate what Retrieve Metadata for PDF does for items that don't have embedded text?

In Zotero, "indexing" refers to scanning a file for full-text content and storing it in the database so it's searchable (and also for running Retrieve Metadata on the item).

louisiana2010 · March 14, 2012

Well all I am trying to do is to collect Metadatas that ZOTERO is unable to extract from the pdfs.
I use "indexing" in the sense of "retrieving metadata".
Now pdfs files that are image only don't come in 90% of my cases with attached metadatas that can be extracted.
SO my workflow is as follow:
get my references from my finder into Zotero as links
select a few and try to get metadata automatically.
It fails in 75%of the cases
which leaves me with blank references from which I can't generate a scholarly bibliography
So I am trying to get the info.

The only reason I want to focus on links icons is because whenever a file is indexed it displays a book or article icon and the link icon is then grouped with the adhoc icon.

If not it means than the metadata haven't been extracted and that the ref are blank for this item.

adamsmith · March 14, 2012

oh OK. That would be "unattached" or "without metadata" - indexing means to read the text content of the file - which obviously isn't possible for non-OCRd files.
You can add a column called "Type" to the middle panel by clicking on the table-like icon at the top right of the middle panel. You can then sort by that column by clicking on it. Unattached files will be of type "Attachment".
You can achieve the same, I believe, by using Type --> is --> Attachment in the advanced search (magnifying glass icon).

louisiana2010 · March 14, 2012

OK right that's it so I need to select files "without metadata": done your way, thanks!

Now how do I get the metadata other than from generating a report and looking them up in worldcat? I have a feeling I can add worlcat or other library catalogs to the search options of ZOtero. But how?
Thanks

adamsmith · March 14, 2012

there is no other way than to do this manually, unfortunately.

louisiana2010 · March 14, 2012

:-(( well thanks a lot , the application does a lot already!

louisiana2010 · March 14, 2012

Oh one last last thing:

I followed the steps described here
http://forums.zotero.org/discussion/17119/pull-search-engines-from-firefox-4-into-zotero-211/
and added Worlcat to my search options in Firefox on the upper right yet it doesn't appear in Zotero's search engines list.
I feel it's close though. What would be needed to be able to integrate Worldcat in the ZOtero list of supported search engines?

adamsmith · March 14, 2012

You're misunderstanding how search engines in Zotero work - they're not for automatically importing data into Zotero - that's what the web translators are for - they're for automatically looking up items that already have data in Zotero via "Library Lookup" (see the green arrow next to the search field).

Say you have a book in Zotero and want to know if a library nearby has it - you select the book, click on Library Lookup and - with the default preferences - you'll be directed to the Worldcat search results, which display local library holdings (for most libraries, you can replace Worldcat by the lookup engine for your local library).

None of this is any help to get data into Zotero, I would have mentioned it otherwise.

louisiana2010 · March 14, 2012

"None of this is any help to get data into Zotero, I would have mentioned it otherwise. "
well how does one call a catalog that allows you to collect metadata then?

By generating a report with specific files lacking metadata
then looking them up in Worldcat by simply copying and pasting the information collected by Zotero allows me to get a list of reference in Worlcat and collect the metadata in ZOtero by clicking on the Book or Article icon I collect my info
so yes this is of help in order to get the information I need

There must be a way to make Zotero automatically interrogates Worldcat with the report it generates , half of the work is done.
:-)

adamsmith · March 14, 2012

sorry, I don't follow anymore - obviously yes, you can search for stuff in worldcat and Zotero's web translator helps you import it. You were already doing that, so I assumed you didn't ask about that possibility.

But "adding a search engine" to Zotero as described in the thread you link to has nothing to do with that option.

And no, there is no way to automatically search a catalog based on file names. That could likely be done, but doing it even reasonably well is not going to be trivial.