Query Limit reached?

Enrico68 · June 17, 2014

"yeah, that's what I said. Either you're entering the captcha incorrectly or for some reason it's not passed on to google. ...???"

But are you joking or what? I tell you for the nth times that EVEN AFTER ENTERING THE RIGHT CAPTCHA IS GOING ON TO ASK YOU AGAIN TO INSERT A NEW CAPTCHA.

Did you see the sequences of screenshots ?? Those were taken after inserting the captcha code which as you can see is not so weird to figure out which is. And after inserting 4 times the RIGHT code it sends the last message (http://i61.tinypic.com/35a3ww7.png).

dstillman · June 17, 2014

https://www.zotero.org/support/forum_guidelines#etiquette

adamsmith · June 17, 2014

First thing I'd try is disabling all other Firefox add-ons and try again. As per Dan above, whatever you enter for the captcha never even makes it to Zotero, let alone to google (which is one of the things I said could be going on). But yeah, most of us aren't getting paid for this, so if you want to yell at people, go somewhere else, please.

Enrico68 · June 17, 2014

I understand the 'etiquette' but when someone is denying the evidence...

aurimas · June 17, 2014

Dan, can you see what version of Firefox this is?

dstillman · June 17, 2014

30.0, OS X

Enrico68 · June 17, 2014

Finish now the 24hrs 'quarantine' from Gscholar. Now is ok, but the problem still remain

aurimas · June 17, 2014

What do you mean now it's ok but the problem remains? Are you able to get past the captcha? If no, are you pressing OK or Enter to submit the captcha?

aurimas · June 17, 2014

FYI, I'm not able to reproduce the issue on my computer (Windows 7 though, but shouldn't matter). The captcha works properly for me.

ajaros · September 3, 2014

Hi,
after i talked a few of my collegues into getting zotero yesterday, our whole company was blocked from google scholar. it shows this message (no captcha to fill in):

We're sorry...

... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.
See Google Help for more information.

i believe it will go away in a while and return again until they import their papers. i am reporting this just for you to know that there is still a problem

aurimas · September 3, 2014

And this is after attempting to retrieve metadata for PDFs? I assume you tried accessing Google Scholar from a computer that does not have Zotero installed and were blocked as well. Do you know if all the computers on your network have the same external IP (i.e. do you see the same IP if you google "my ip")? There's really not much we would be able to do if Google Scholar is blocking by IP (or an IP range even). Simon, Dan, any ideas?

dstillman · September 3, 2014

We're sorry...

... but your computer or network may be sending automated queries.

Well, in truth, that's exactly what's happening. If a bunch of people on a network try to retrieve metadata for many files around the same time, this will probably happen. (It'd be good to know if you all share the same global IP address, though, and approximately how many files you were all trying to retrieve metadata for.)

There aren't great alternatives here. The changes that emerged from this thread, to have Zotero try to work around temporary client-side blocks from Google Scholar, prevented the vast majority of problems for individual users. But these are indeed automated queries, and at some point Google, quite reasonably, probably switches to server-side blocking by IP address. Nothing to be done about that at this point other than waiting.

The only thing we could really do to address this is be more respectful of the various client-side block attempts from Google when they occur. I've made the case for this in the past, and I don't remember if we add in any delays at all now, but that would have the effect of slowing down metadata retrieval for people, and there's no guarantee it would prevent server-side blocks (which seem to be quite rare).

aurimas · September 3, 2014

I've made the case for this in the past, and I don't remember if we add in any delays at all now

We wait 2 seconds between queries. Increasing that may or may not help individual users (I have a feeling Google is pretty smart about detecting automatic requests and we won't outsmart them), but, in any case, that wouldn't help for multiple computers sending subsequent queries.

dstillman · September 3, 2014

in any case, that wouldn't help for multiple computers sending subsequent queries

That depends partly on whether the cookie-based rate-limiting is affected by the IP-based blocks. (In other words, do they block by cookie sooner if there are more requests from the network?) It's quite possible it isn't, but backing off after cookie-based blocks would still have the effect of reducing the number of simultaneous connections from the network.

ajaros · September 3, 2014

Yes we are all behind one IP,

BUT after asking my colleagues a bit more about their "block" it seems that they only get two captchas and if they pass, they can use google scholar normally.

and I have just now found out that other browsers on my computer (and even firefox in private mode) do get captcha.. so the only thing that gets blocked without a captcha is my non-private firefox. To me it seems as a potential opportunity for workaround for zotero.

Another thing is that i am only blocked this way from scholar.google.com, not scholar.google.de (where i need to go through two captchas)

ajaros · September 3, 2014

so it seems that it is cookie based hard block (desnt make sense to me, any malicious user can easily dump cookies), which is accompanied by some IP based soft-block.

aurimas · September 3, 2014

We should already be taking advantage of the workaround you are seeing when retrieving metadata for PDFs (which is what this thread is mostly about). I'm guessing that you would get the captchas in non-private Firefox session if you clear the cookies (that's effectively what Zotero does). Are you getting blocked after using Zotero to fetch items from Google Scholar (via URL bar icon) or after attempting to retrieve metadata for existing PDFs?

ajaros · September 3, 2014

well, you hit the nail on the head, now i can see the problem started after i clicked "update citations"- a feature of an independent plugin to zotero:
https://addons.mozilla.org/cs/firefox/addon/zotero-scholar-citations/

I think that explains everything.

I can use scholar normally now after deleting all my cookies.

Anyway, thank you for the great work on zotero you do it has been a big help in my research.

Enrico68 · September 3, 2014

A little bit off the topic, but the way G behave in putting a limit in accessing a reference database for our knowledge give you the clear picture why this corporation should be on top of the list of the most evil IT companies of the whole world. And is laughable when someone here write "The only thing we could really do to address this is be more respectful of the various client-side block attempts from Google when they occur".

dstillman · September 3, 2014

The data is irrelevant. They're Google's servers, and they have every right to protect them from automated bots. And you misunderstood my comment above. A cookie-based block is essentially a warning. If Zotero clients repeatedly circumvent those and become IP-blocked, that's a problem for Zotero users. "Respecting" the cookie-based block by backing off would be a way to prevent the more severe block. At least in this case, though, it sounds like the IP block wasn't caused by Zotero.

Enrico68 · September 3, 2014

"The only thing we can do" is to NOT use google scholar for our search (there are much better and not IP filtered search engines). In this way we address G to change his policy miserable driven by $$$...

dstillman · September 3, 2014

Please take the anti-Google diatribes elsewhere. This isn't about money. Properly run websites do rate-limiting. That's all this is.

Feel free to submit a patch that makes Zotero use another full-text academic search engine of equivalent quality that does no rate-limiting.

aurimas · September 3, 2014

Feel free to submit a patch that makes Zotero use another full-text academic search engine of equivalent quality that does no rate-limiting.

(or even suggest a publicly-available one)

Enrico68 · September 3, 2014

This are some of the reason why G scholar ranks low in comparison to other search engine

http://www.ncbi.nlm.nih.gov/pubmed/20420728
http://libguides.lib.msu.edu/pubmedvsgooglescholar
https://noril.uib.no/index.php/noril/article/viewFile/10/6

On Pubmed, ArXiv,Quertle,Web of Science (just for citing the ones I use) I can query without any limit imposed. I even believe that are "properly run website". There is no need to make a new "patch" and it is not a critic a Zotero "until" it remains free to use.

dstillman · September 3, 2014

You have no way of knowing if those services are rate-limited unless you've actually made high-rate automated requests against them, as Zotero does to GS — that's why I asked for a patch rather than a suggestion. (And the PubMed API goes down constantly. If they're not doing rate-limiting, maybe they should start.)

But those aren't replacements for Google Scholar anyway. We don't know what field a PDF is from in advance, and we can't just send out requests to 20 different sites. It would be useless and irresponsible to send huge numbers of automated requests with snippets of, say, sociology PDFs to PubMed.

Anyhow, given that there doesn't actually seem to be a current problem with GS here, I think we can stop wasting time on this discussion.

andreaspita · September 12, 2014

Hi guys,
I am facing this problem right now (> 4000 pdf queries ...). scolar has been unblocked by the captcha. However (zotero still running) I see "google scholar limit reached ..." for some file. Once it will finish, I am wondering whether I can find the pdf which misses the metadata info in order to re-query them. or if I select all the files, Zotero will be clever enough (not like me ...) to retrieve the info for those files without metadata info?

thanks

adamsmith · September 12, 2014

well, files that were successful will be attached to Zotero items, those that weren't are still top-level PDF files, so they're easy to find (you can actually sort the middle column by "Item Type", which sorts attachments -- i.e. top level PDF -- as their own category.

jbaart · December 18, 2014

For what it's worth, I'm experience the same issues. I'm using Standalone with Chrome connector and can reach the scholar website without being asked for a captcha (though I was asked yesterday). In Zotero Standalone, when trying to retrieve PDF metadata for a total of five PDFs, it asks me for the captcha three times (for the same file) and then just outputs errors for all of them about the query limit. I definitely entered them correctly.

Just so there are no misunderstandings, it worked initially but I got locked out of Scholar (via Zotero) yesterday after a few dozen PDFs had been checked.

Here's the full log for the process of retrieving metadata for five PDFs:
http://txs.io/idvb

dstillman · December 18, 2014

RecognizePDF: Failed to solve CAPTCHA after multiple attempts.

And you're sure you were solving the captchas correctly? If so, it's possible that Google made changes on their end (perhaps along with the recent reCAPTCHA changes) that we have to adjust for.

jbaart · December 18, 2014

Yes, I have a close to 100% captcha "hit rate" usually, and these weren't hard ones. I also ran through this whole "process" multiple times yesterday night and this morning to see if Scholar's block was gone. It's now working again, thankfully.