Plugin to fetch missing abstracts?

Abstracts are really useful when you want to parse a lot of items in one go, sadly many items in my database have missing abstracts. Does anyone know if there exists a plugin that could programmatically fetch abstracts when they’re missing from item metadata?
  • Pretty sure there's nothing currently, no.
  • Yeah I just checked on Crossref, which I know other plugins fetch their data from (e.g., DOI) and abstracts aren’t part of metadata in their database. That would likely mean a plugin would have to fetch the abstract from individual journals—a pretty hard task I guess.

    The only other option I see would be to fetch from Google Scholar or Google Books, but other plugins have had trouble not being locked by Google’s anti-abuse mechanisms when processing a large number of items.
  • CrossRef does, increasingly, have abstracts, so in theory that'd be an option, as would querying PubMed where a PMID is present, and OpenAlex, which might have some more data than CrossRef, but no one has done that so far. Wouldn't be prohibitively hard, though, using either the DOI Manager or the PMID fetcher as a template.
  • Interesting. I just checked and it’s indeed pretty straightforward with a DOI to obtain e.g. the PMID, and export it as structured text (nbib) with the abstract included. That means the plugin would be a simple loop over item abstracts, and if blank curl a request from pubmed—seems pretty simple in theory. I have zero experience developing Zotero plugins (in Python it would take me 10 minutes of work heh) but maybe I’ll give this a try once things settle a bit with work!
  • Actually I just remembered about pyzotero ; maybe I could try to make a script that does a one-time backfill for the 1500 items in my library whose abstract is missing.

    Papers will be probably be easy, but for books I’m not sure. Any idea how I could get book summaries with their ISBN? Not sure if Google Books will make it easy to extract that amount of information.
  • Library of Congress has book summaries in a lot of its MARC data -- you can check out the calls Zotero is already making as part of its ISBN lookup https://github.com/zotero/translators/blob/master/Library of Congress ISBN.js

    Agree, if you just want to backfill, using pyzotero is going to work pretty well.
  • Interestingly, my library seems filled with books that have either,

    a) no ISBN, or
    b) an ISBN, with "Library of Congress ISBN" in the "Library Catalog" field
    c) an ISBN, with "Open WorldCat" in the "Library Catalog" field

    I’ve tried searching the LoC for books of category (b) but oddly can’t find them in their database. I’m gonna guess that Zotero would have pulled the abstract already if it was available anyway.

    Also odd, tried looking for books of category (c) by ISBN in WorldCat and they *do* seem to have abstracts. Strangely Zotero did not pull those out?

    In any case, that does seem like an interesting problem to solve...
  • If you end up putting a script/plug-in together to gather abstracts by DOI, I would be very interested! Looking to find a way to get abstracts by bulk for my non-abstract items in a big collection.
  • From PMID, the Abstract Export option generates good data. It's missing field labels or delimiters. Perhaps there's a way to merge the Citation Manager RIS file with Abstract entries? VLookup?
  • @alexcr87 I'd be curious if you could share your code? I'm trying to source a whole bunch of abstracts (in my case, the 10s of thousands) for a research project, starting with .bib files, of which about half the citations have abstracts.
  • I would love this script or plugin as well.
  • edited March 27, 2024
    Apologies for the necrobump, but would anyone be OK with sending money to a willing/interested dev, who might create such a plugin? I'd love to donate money!

    It would be a godsend to fetch the abstracts in folders with hundreds of papers automatically, and not just by hand.
Sign In or Register to comment.