Importing from Google Scholar

I need to import around 4000 citations from Google Scholar into Zotero. I understand that adding each of these into My Library individually, and then downloading all of these as BibTex is the usual way of doing this, but this is cumbersome and google scholar doesn't let this happen easily, as it recognizes my efforts at a robotic search.
Just wondering if there is an easier way around?
  • edited May 16, 2024
    You don't say how you arrived at the 4000 number or what you will do with the records when you download them but my team and I have lots of experience with this sort of thing. GS is essential to my daily work.

    There are work-arounds:

    See: https://www.zotero.org/support/kb/site_access_limits

    I don't use the work-around described above because it doesn't really save time if you also want to get PDF documents.

    The way to get the most accurate and complete metadata for those 4000 items is to follow the GS link to the publisher's' websites and import to Zotero. GS has improved accuracy of its metadata over what it was a few years ago but there are still frequent egregious errors including wrong journal, wrong volume, omitted authors, etc. GS harvests its contents (at least in part) from OCRs of the reference lists of papers it finds online.

    GS will treat you as a bot much sooner than 500 direct download records much less 4000. It will (seemingly) never treat you as a bot if you link to the publisher in a new tab -- the few seconds to capture the metadata and file from the publisher will keep you safe. I've worked this way with GS to find items on the publisher's websites. Some publisher sites respond in a second or two and you can import a record in another second or two. Other publishers' sites are slow. Doing this one-by-one (if our experience -averaging 7-8 seconds per item-- will be yours) that would be a bit more than 8 worker hours.

    If you plan to capture PDF files of everything, add another hour (if you are using Zotero and grabbing items from the publisher when capturing the metadata). Some publishers sites take a few extra seconds to send the PDF.

    If you get metadata directly from GS and later selectively link to publisher sites to obtain relevant PDFs, the GS capture time plus the time needed to link from the Zotero record to the publisher will sum to about the same about-a-day duration. But remember that a substantial proportion of the GS records metadata in BibTex format are incorrect -- the links to the publisher site, however, are virtually all correct. For each record with incorrect metadata there will be a substantial amount of time wasted to find the correct information and find the article online. Even when GS has metadata errors and no DOI, you can usually correctly link to the item on the publisher's site.

    Please don't cite something that you haven't fully read and understood.

    If you are at a university and within its library umbrella, you may have options other than GS.
  • edited May 16, 2024
    I am doing a systematic review, so I need citations for all these search results.
    Seems there is no easy way out
  • Citations in Google Scholar are as reliable as the rest of the metadata, I would not trust them for any serious work.
  • edited May 28, 2024

    I find the overall quality of metadata in ALL bibliographical databases (EVEN the ones our Universities pay good money for like WoS and Scopus) still appalingly bad. Open Science IS clearly making progress, which is exciting. But that progress remains glacially slow, which remains very disappointing. But having said that - GS definitely still remains king of the bibliographical metadata-hill (with OpenAlex probably the most exciting 'new kid on the block' WITH a wonderful free API) in terms of coverage. But so here's what _we_ now tend to do with GS for systematic literature reviews:


    • we use scholarly with scraperapi for the getting the GS metadata based on queries (make sure you limit your queries to <1000 hits - using years and modifiers) [and - for the record - I'm trying to talk scraperapi into offering academics less extortionary prices for JUST Google Scholar, but so far without much success. They DO offer a 50k API Credits and 25 concurrent threads for $10/month deal for academics. Which is nice of them. But since they charge 15 credits per Google Scholar call, this (IMO) is DEFINITELY too restrictive for 'systematic literature reviews'. I'm actually planning on writing a medium.com piece on all of this, but that might still take a while];

    • we (sometimes) try to augment the json containing the the GS metadata with some other metadata (e.g. the DOIs - for instance from from crossref) and using some synthetic key - also FAR from perfect, BUT feasible!);

    • we convert the resulting json file to an ris file (and I have code for that which I'll put on github if anybody's interested);

    • we then import that ris file into Zotero - mostly for generating bibliographies and for foot-/end-notes; BUT also as a storage solution for our corpora;

    • we then USED to try to 'Find the pdfs' from within Zotero, but I've been kind of disappointed with Zotero 7 and the lack of plugin support) and from behind our universities' VPNs; I'm still looking into better (programmatic) options for that;

    • our next step then is using GROBID to transform the pdf attachments in a Zotero library into a nice (jsonl) dataframe;

    • and that is then when the REAL fun starts - as we can then apply the full amazing suite of (ALSO open-source!) NLP and increasingly also LLM/RAG solution to 'REALLY' start doing our epistemic homework in ways that almost NONE of our disciplines have done.

    I do have python code (rather - still mostly just Jupyter notebooks, but that DO do the trick!) for all of this. I have already shared some of those on this forum, but I just haven't found time to clean them up and to share and document them properly on GitHub. If nobody else has made things like this available (I remember from some discussions on this forum that others were working on this too) I will, though. And in the meanwhile, I'm more than happy to share these notebooks with anybody who reaches out to me...

     

Sign In or Register to comment.