"yeah, that's what I said. Either you're entering the captcha incorrectly or for some reason it's not passed on to google. ...???"
But are you joking or what? I tell you for the nth times that EVEN AFTER ENTERING THE RIGHT CAPTCHA IS GOING ON TO ASK YOU AGAIN TO INSERT A NEW CAPTCHA.
Did you see the sequences of screenshots ?? Those were taken after inserting the captcha code which as you can see is not so weird to figure out which is. And after inserting 4 times the RIGHT code it sends the last message (http://i61.tinypic.com/35a3ww7.png).
First thing I'd try is disabling all other Firefox add-ons and try again. As per Dan above, whatever you enter for the captcha never even makes it to Zotero, let alone to google (which is one of the things I said could be going on). But yeah, most of us aren't getting paid for this, so if you want to yell at people, go somewhere else, please.
Hi,
after i talked a few of my collegues into getting zotero yesterday, our whole company was blocked from google scholar. it shows this message (no captcha to fill in):
We're sorry...
... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.
See Google Help for more information.
i believe it will go away in a while and return again until they import their papers. i am reporting this just for you to know that there is still a problem
And this is after attempting to retrieve metadata for PDFs? I assume you tried accessing Google Scholar from a computer that does not have Zotero installed and were blocked as well. Do you know if all the computers on your network have the same external IP (i.e. do you see the same IP if you google "my ip")? There's really not much we would be able to do if Google Scholar is blocking by IP (or an IP range even). Simon, Dan, any ideas?
... but your computer or network may be sending automated queries.
Well, in truth, that's exactly what's happening. If a bunch of people on a network try to retrieve metadata for many files around the same time, this will probably happen. (It'd be good to know if you all share the same global IP address, though, and approximately how many files you were all trying to retrieve metadata for.)
There aren't great alternatives here. The changes that emerged from this thread, to have Zotero try to work around temporary client-side blocks from Google Scholar, prevented the vast majority of problems for individual users. But these are indeed automated queries, and at some point Google, quite reasonably, probably switches to server-side blocking by IP address. Nothing to be done about that at this point other than waiting.
The only thing we could really do to address this is be more respectful of the various client-side block attempts from Google when they occur. I've made the case for this in the past, and I don't remember if we add in any delays at all now, but that would have the effect of slowing down metadata retrieval for people, and there's no guarantee it would prevent server-side blocks (which seem to be quite rare).
I've made the case for this in the past, and I don't remember if we add in any delays at all now
We wait 2 seconds between queries. Increasing that may or may not help individual users (I have a feeling Google is pretty smart about detecting automatic requests and we won't outsmart them), but, in any case, that wouldn't help for multiple computers sending subsequent queries.
in any case, that wouldn't help for multiple computers sending subsequent queries
That depends partly on whether the cookie-based rate-limiting is affected by the IP-based blocks. (In other words, do they block by cookie sooner if there are more requests from the network?) It's quite possible it isn't, but backing off after cookie-based blocks would still have the effect of reducing the number of simultaneous connections from the network.
BUT after asking my colleagues a bit more about their "block" it seems that they only get two captchas and if they pass, they can use google scholar normally.
and I have just now found out that other browsers on my computer (and even firefox in private mode) do get captcha.. so the only thing that gets blocked without a captcha is my non-private firefox. To me it seems as a potential opportunity for workaround for zotero.
Another thing is that i am only blocked this way from scholar.google.com, not scholar.google.de (where i need to go through two captchas)
so it seems that it is cookie based hard block (desnt make sense to me, any malicious user can easily dump cookies), which is accompanied by some IP based soft-block.
We should already be taking advantage of the workaround you are seeing when retrieving metadata for PDFs (which is what this thread is mostly about). I'm guessing that you would get the captchas in non-private Firefox session if you clear the cookies (that's effectively what Zotero does). Are you getting blocked after using Zotero to fetch items from Google Scholar (via URL bar icon) or after attempting to retrieve metadata for existing PDFs?
well, you hit the nail on the head, now i can see the problem started after i clicked "update citations"- a feature of an independent plugin to zotero:
https://addons.mozilla.org/cs/firefox/addon/zotero-scholar-citations/
I think that explains everything.
I can use scholar normally now after deleting all my cookies.
Anyway, thank you for the great work on zotero you do it has been a big help in my research.
A little bit off the topic, but the way G behave in putting a limit in accessing a reference database for our knowledge give you the clear picture why this corporation should be on top of the list of the most evil IT companies of the whole world. And is laughable when someone here write "The only thing we could really do to address this is be more respectful of the various client-side block attempts from Google when they occur".
The data is irrelevant. They're Google's servers, and they have every right to protect them from automated bots. And you misunderstood my comment above. A cookie-based block is essentially a warning. If Zotero clients repeatedly circumvent those and become IP-blocked, that's a problem for Zotero users. "Respecting" the cookie-based block by backing off would be a way to prevent the more severe block. At least in this case, though, it sounds like the IP block wasn't caused by Zotero.
"The only thing we can do" is to NOT use google scholar for our search (there are much better and not IP filtered search engines). In this way we address G to change his policy miserable driven by $$$...
On Pubmed, ArXiv,Quertle,Web of Science (just for citing the ones I use) I can query without any limit imposed. I even believe that are "properly run website". There is no need to make a new "patch" and it is not a critic a Zotero "until" it remains free to use.
You have no way of knowing if those services are rate-limited unless you've actually made high-rate automated requests against them, as Zotero does to GS — that's why I asked for a patch rather than a suggestion. (And the PubMed API goes down constantly. If they're not doing rate-limiting, maybe they should start.)
But those aren't replacements for Google Scholar anyway. We don't know what field a PDF is from in advance, and we can't just send out requests to 20 different sites. It would be useless and irresponsible to send huge numbers of automated requests with snippets of, say, sociology PDFs to PubMed.
Anyhow, given that there doesn't actually seem to be a current problem with GS here, I think we can stop wasting time on this discussion.
Hi guys,
I am facing this problem right now (> 4000 pdf queries ...). scolar has been unblocked by the captcha. However (zotero still running) I see "google scholar limit reached ..." for some file. Once it will finish, I am wondering whether I can find the pdf which misses the metadata info in order to re-query them. or if I select all the files, Zotero will be clever enough (not like me ...) to retrieve the info for those files without metadata info?
well, files that were successful will be attached to Zotero items, those that weren't are still top-level PDF files, so they're easy to find (you can actually sort the middle column by "Item Type", which sorts attachments -- i.e. top level PDF -- as their own category.
For what it's worth, I'm experience the same issues. I'm using Standalone with Chrome connector and can reach the scholar website without being asked for a captcha (though I was asked yesterday). In Zotero Standalone, when trying to retrieve PDF metadata for a total of five PDFs, it asks me for the captcha three times (for the same file) and then just outputs errors for all of them about the query limit. I definitely entered them correctly.
Just so there are no misunderstandings, it worked initially but I got locked out of Scholar (via Zotero) yesterday after a few dozen PDFs had been checked.
Here's the full log for the process of retrieving metadata for five PDFs:
http://txs.io/idvb
RecognizePDF: Failed to solve CAPTCHA after multiple attempts.
And you're sure you were solving the captchas correctly? If so, it's possible that Google made changes on their end (perhaps along with the recent reCAPTCHA changes) that we have to adjust for.
Yes, I have a close to 100% captcha "hit rate" usually, and these weren't hard ones. I also ran through this whole "process" multiple times yesterday night and this morning to see if Scholar's block was gone. It's now working again, thankfully.
But are you joking or what? I tell you for the nth times that EVEN AFTER ENTERING THE RIGHT CAPTCHA IS GOING ON TO ASK YOU AGAIN TO INSERT A NEW CAPTCHA.
Did you see the sequences of screenshots ?? Those were taken after inserting the captcha code which as you can see is not so weird to figure out which is. And after inserting 4 times the RIGHT code it sends the last message (http://i61.tinypic.com/35a3ww7.png).
after i talked a few of my collegues into getting zotero yesterday, our whole company was blocked from google scholar. it shows this message (no captcha to fill in):
We're sorry...
... but your computer or network may be sending automated queries. To protect our users, we can't process your request right now.
See Google Help for more information.
i believe it will go away in a while and return again until they import their papers. i am reporting this just for you to know that there is still a problem
There aren't great alternatives here. The changes that emerged from this thread, to have Zotero try to work around temporary client-side blocks from Google Scholar, prevented the vast majority of problems for individual users. But these are indeed automated queries, and at some point Google, quite reasonably, probably switches to server-side blocking by IP address. Nothing to be done about that at this point other than waiting.
The only thing we could really do to address this is be more respectful of the various client-side block attempts from Google when they occur. I've made the case for this in the past, and I don't remember if we add in any delays at all now, but that would have the effect of slowing down metadata retrieval for people, and there's no guarantee it would prevent server-side blocks (which seem to be quite rare).
BUT after asking my colleagues a bit more about their "block" it seems that they only get two captchas and if they pass, they can use google scholar normally.
and I have just now found out that other browsers on my computer (and even firefox in private mode) do get captcha.. so the only thing that gets blocked without a captcha is my non-private firefox. To me it seems as a potential opportunity for workaround for zotero.
Another thing is that i am only blocked this way from scholar.google.com, not scholar.google.de (where i need to go through two captchas)
https://addons.mozilla.org/cs/firefox/addon/zotero-scholar-citations/
I think that explains everything.
I can use scholar normally now after deleting all my cookies.
Anyway, thank you for the great work on zotero you do it has been a big help in my research.
Feel free to submit a patch that makes Zotero use another full-text academic search engine of equivalent quality that does no rate-limiting.
http://www.ncbi.nlm.nih.gov/pubmed/20420728
http://libguides.lib.msu.edu/pubmedvsgooglescholar
https://noril.uib.no/index.php/noril/article/viewFile/10/6
On Pubmed, ArXiv,Quertle,Web of Science (just for citing the ones I use) I can query without any limit imposed. I even believe that are "properly run website". There is no need to make a new "patch" and it is not a critic a Zotero "until" it remains free to use.
But those aren't replacements for Google Scholar anyway. We don't know what field a PDF is from in advance, and we can't just send out requests to 20 different sites. It would be useless and irresponsible to send huge numbers of automated requests with snippets of, say, sociology PDFs to PubMed.
Anyhow, given that there doesn't actually seem to be a current problem with GS here, I think we can stop wasting time on this discussion.
I am facing this problem right now (> 4000 pdf queries ...). scolar has been unblocked by the captcha. However (zotero still running) I see "google scholar limit reached ..." for some file. Once it will finish, I am wondering whether I can find the pdf which misses the metadata info in order to re-query them. or if I select all the files, Zotero will be clever enough (not like me ...) to retrieve the info for those files without metadata info?
thanks
Just so there are no misunderstandings, it worked initially but I got locked out of Scholar (via Zotero) yesterday after a few dozen PDFs had been checked.
Here's the full log for the process of retrieving metadata for five PDFs:
http://txs.io/idvb