Google Scholar blocks my IP after a few metadata lookups

I have a large PDF library in Zotero Standalone that I want to collect metadata on. I can look up a few dozen successfully, but then they all fail in the lookup, when I know the DOI information is in the PDF. When I go to Google Scholar in my browser, I get the following message:

http://imgur.com/vi7FY

It seems like the Google Scholar servers don't like a bunch of lookups in a row. Are there other services that would allow this? Can Zotero servers act as a proxy? Can Zotero standalone regulate its requests better to not get banned by the Google Scholar servers?

Thanks,
Michael
  • that's known and there is nothing we can do about it. You're making a lot of automated requests to google scholar in a short period of time and that's exactly what they're trying to prevent with their IP-bans.
  • In that case, would it be possible to have a different error message for "Could not connect to database" versus "No matching references found"?

    Are there other databases (PubMed?) that would be suitable and less strict for automated requests than Google Scholar?
  • We could and probably should throttle requests to Google Scholar, but that requires empirical testing to determine what the throttle should be. We're also exploring using Microsoft Academic Search either as a replacement for Google Scholar or a complementary source. PubMed doesn't do full text search so it's not useful.
  • Ok. Thanks, Adam and Simon. I may do some quick testing in future days to see how many requests I can submit in quick succession, though I fear it will be determined by the existing load on the servers (impossible to measure) rather than a hard limit.
Sign In or Register to comment.