Retrieve Metadata for PDFs not working

Fizz · November 4, 2009

Hello Everyone !

I found Zotero while searching for a reference manager that would import PDFs. I was pleasantly surprised Zotero could even extract metadata from PDFs ! However, I can't get it to work. I use FF 3.5.4 and Zotero 2.0b7.4 on WinXP Pro. I have not been able to extract metadata from a single PDF! All the PDF's I tried (more than 10) were online and available on Google Scholar. Could anyone shed some light on this? Perhaps a workaround? I even submitted my logs and a couple of PDFs to Zotero support to no avail :(

adamsmith · November 4, 2009

if you just send emails to Zotero they'll ignore them. There is just a small dev team, no customer support unit.
The way this works is that you post in the forum, including an error ID of a report sent. In 9 out of 10 (at least) cases things get resolved. If not, they'll sometimes ask you to send them a document (though for pdfs a simple link will do - you can also post that link here)

Go to your preferences. In the search tab - what do you see? Are the pdf tools installed - i.e. do you see something like: pdftotext version 3.02 is installed ?
If not that's where the problem is.
Try this one
http://www.samren.org/Research_Papers/doc/ADB%20Remittance%20Study.pdf
this works - if it doesn't, there is something wrong with your settings. If it does, there's something specific about the pdfs you tried.

Fizz · November 4, 2009

Thanks for the reply ! I did not get any error ID while using my PDFs.
Yes, I did do all the stuff that is already covered in the forum - including the pdftotext 3.02 being installed.

The PDF you linked to also gives me an error "No matching references found" in the Metadata progress window. Btw, the PDF opens fine after inserting in my library. So what could be wrong in my settings ? I would assume it has something to do with the Google Scholar - Zotero interface?

adamsmith · November 4, 2009

you would get the error ID when you use the "Report Error" function in the gears menu.
Try to search for a phrase that's in one of the pdfs to see if the indexing works correctly.

It seems odd that the google scholar search would be wrong - I can't really see how it could be.

Fizz · November 4, 2009

The Report ID is 1265646937. And yes, the indexing is working fine. All my pdfs (and other references) can be searched correctly.

Fizz · November 5, 2009

And Mendeley seems to import/extract metadata fine from my files. I don't want to change from Endnote to Mendeley :(

dstillman · November 5, 2009

adamsmith's PDF works for me as well. Have you tried actually searching for some of these articles in Google Scholar and trying to save them to Zotero?

Please provide a Debug ID for the retrieval attempt.

Also make sure that "Accept third-party cookies" is enabled in the Privacy pane of the Zotero Firefox preferences.

Fizz · November 5, 2009

SOLVED !!!

Forgive me for my naivete :( but I was not planning on using Google Scholar for constructing my bibliography! I use PubMed and it works fine with Zotero.

@Dan I did try GS and it doesnt work for the same citations that work on PubMed! It does not give any error ID. My retrieve metadata error ID was 1265646937.

I could not find a Privacy Pane in Zotero 2.0b7.4 preferences. However, when I set accept third-party cookies in FF everything came together, at last !!!

Thanks Dan and Adam ! Beginning of the end...Endnote

achu1010 · November 5, 2009

Hi, ran a batch of pdfs, and it was too much for google scholar, that blocked me out... here is the message from the google scholar search url:,....
''
Google
Sorry...
We're sorry...

... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.
See Google Help for more information.

© 2009 Google - Google Home''

adamsmith · November 5, 2009

different and well known issue. no solution - that is a google issue.
A search on the forum would have yielded multiple results.

dstillman · November 5, 2009

I could not find a Privacy Pane in Zotero 2.0b7.4 preferences. However, when I set accept third-party cookies in FF everything came together, at last !!!

Er, yes, I meant Firefox preferences. This is listed on the known issues page. It will be fixed in Firefox 3.6.

benrunkle · January 15, 2010

I had this same issue come up, and wish that there had been a warning GUI pop up from zotero. This happened within the first hour of downloading Zotero for the first time, and I suspect would be a common problem for other new users who are migrating into this new system and unfamiliar with the restrictions of the (amazing) "retrieve metadata" functionality. Could there be some check or warning when someone tries to download metadata for more than the trigger number of pdf files?

Thanks!

Bionatsci · January 15, 2010

Could there be some check or warning when someone tries to download metadata for more than the trigger number of pdf files?

In principle, a great idea, in practice I don't think anyone has actually figured out what exactly the trigger is. Various forum threads (for example this one) have reported blocks for "greater that 400 items in 24 hrs" (a 24 hr block) or as few as 20 items. Lengths of blocks reported range from under an hour to 24 hrs. Therefore it appears to be more complicated than a simple trigger number. Potentially there are several limits - something like 400 in 24 hrs or 40 in an hour or 20 in a minute, but this is just guesswork.

Someone could try to ask google, but I don't know whether they would respond.

adamsmith · January 15, 2010

why not a pop-up warning when you try to retrieve for more than 20? The warning could just say "trying to retrieve metadata for a large number of items may trigger a temporary block by google."
or something along those lines? Or maybe something better than a pop-up - but what?

Bionatsci · January 15, 2010

A pop-up for more than 20 would probably work for many cases, it would be surprising if there were any circumstances where a block would be triggered for less than 20 items. Several individual retrieves each of less than 20 could still trigger a block though. Even so, a pop-up seems like the best that could be done without knowing more specifics about the blocking algorithm (either by asking google - but I wouldn''t know how to go about that, or by designing some sort of testing script - I wouldn't know how to do that either, or even if it would be allowed by google's terms of use).

So, +1 for a popup as Adam suggests, with an option to continue or to cancel.

Bionatsci · January 15, 2010

Just found this from Dan last March. To quote: "Going forward we'll be making refinements to the PDF recognition feature that should allow you to recognize more PDFs at once."

I don't know what the dev team has up their sleeve, but it looks like they have some (I presume) unimplemented ideas for alleviating this issue.

benrunkle · January 15, 2010

These are great ideas; I think even a pop-up the first time you use it, even if just for one reference...then you know something the limits and bounds of the metadata search. This is common enough when you use the new version of some software and they point out new features, and it's easy to click that 'don't show this message again' button.