How can I extract references from papers?

drtyuj · May 29, 2025

I've just started using Zotero and am in the middle of doing a literature review. Is there a way in which I can automatically transfer all the references used by a paper into my Zotero library?

dstillman · May 29, 2025

Well, what do you mean? If you're using Zotero to cite in the paper, all the items would already be in your Zotero library.

If you're asking whether you can move them all to a collection, you can do that with Reference Extractor. There will be a built-in way of doing that in the future.

drtyuj · May 29, 2025

Say I'm reading a paper taken from my Zotero library which provides a really useful overview of a particular topic and I want to explore the papers it mentions throughout - is there an easy way to essentially copy and paste that paper's bibliography into my Zotero library? These won't necessarily be papers which I've already added to my library as they are papers which have merely been referenced by a paper I'm reading.

I hope this makes sense

dstillman · May 30, 2025

Ah, got it. That's not currently possible in Zotero itself, though I believe there are some plugins that will do it.

tim820 · May 30, 2025

The Reference plugin can the extract reference list from a PDF in your Zotero library. And can take you to the DOI/URL in your browser for any one that you do not already have. There of course you can choose to add/download it with the Zotero web connector.
https://github.com/MuiseDestiny/zotero-reference

Unfortunately the current documentation on github is out of date (only for the older-style, v6-compatible version). And the plugin's Settings dialog in Zotero is only in Chinese. But Chrome's inbuilt translation does a reasonable job on the github pages. And the rest is fairly intuitive (if necessary, Google Lens does a decent job of translating the Settings dialog). Someone has just added an Issue for English settings to be added (the older, v6-compatible version had that).
https://github.com/MuiseDestiny/zotero-reference/issues/355

But there's nothing in the Settings you really need to set in order to make it work (except for Chinese paper access via CNKI for those who need that). And it's so powerful that it is worth the effort to figure out how it works.

It can also show you a list of the papers that cite the current paper (with data from Semantic Scholar; for which a [free] API key is not required, but can be used if you get one). So the plugin's two display modes are References (that the paper cites) and Citations (that cite the paper).

drtyuj · May 30, 2025

Thanks to both of you for offering your help, and especially to you, tim820, for linking me to the Reference plugin - this will be really useful (once I understand how to use it...)

P.S.
just to confirm: is this the Settings dialog which you are referring to?
https://s3.amazonaws.com/zotero.org/images/forums/u17176444/mrb9l91m8q4hs11l9guy.png

tim820 · May 30, 2025

Are you running Zotero v6 ? Because that looks like the Settings dialog for the old, v6-compatible version of the Reference plugin. The Settings dialog under Zotero v7 for the latest version of the plugin (1.4.4) looks like this, even on English language PCs:
https://s3.amazonaws.com/zotero.org/images/forums/u5906489/9icja67g6ss1v7l762ir.png

The old plugin version functions quite differently. It can still extract reference lists (but not get the cited-by list); but it does that in different ways. The new version is cleaner and more powerful (albeit with the only-Chinese Settings). On the plus side, the current github documentation (translated by Chrome for example) describes the old version. So that version may still be of use to you.

drtyuj · May 30, 2025

Huh... look like I'm on the same version as you
https://s3.amazonaws.com/zotero.org/images/forums/u17176444/qvhh39qybdxq7ytyi7z3.png

ilia.leikin · May 30, 2025

If you're OK modifying your workflow a bit, you can try a tool called RefTraceback available here:
https://sourceforge.net/projects/academic-tech-toolbox/

It's clunky but can save some time if you're researchin' hard.

Basically, it allows you to bulk-search the refs (you can extract the bibliography and select the refs that look interesting), and then you can save just the ones you need using the Zotero extension in you browser.

I've been meaning to develop this into a Zotero plugin, but have zero knowledge about how to do that. Any feedback/help welcome!

tim820 · May 30, 2025

@drtyuj in that case it may just be that an old version of the Reference plugin still works in Zotero 7, if you're seeing the old Settings dialog in v7.*

v0.5.9 (June 2023) was the first version to be compatible with Zotero v7. AFAIK the Settings have only been in Chinese since then, but my memory could be wrong about that. They have only been in Chinese for some time.
https://github.com/MuiseDestiny/zotero-reference/releases?page=1
https://github.com/MuiseDestiny/zotero-reference/issues/355

*The actual parameters in the Settings of your old dialog are different to the parameters that can be set in the new version. So it's not just a case of an English translation of the same settings having been available in the past but not now. The old and new plugins are functionally different.

rdiaz02 · February 15, 2026

@ilia.leikin : Thanks for the link to RefTraceback!

Just in case it matters. I am on Linux, but I decompressed the zip file for RefTraceback, asked Gemini and Claude to turn it into something usable under Linux (pdftotext is readily available in most Linux distros) and, after one iteration, Claude (just plain Claude via web interface ---Claude Sonnet 4.5 Extended---, not Claude Code) produced a Python program that seems to closely match what yours does in Windows (with Gemini, what I used initially, I/"we" weren't able, after more than 10 iterations, to get a decently working thing). I can of course send you the code. I guess it should also run under Windows (not tested as I don't have Python installed in my Windows virtual machine).

As you say, it requires modifying workflow; but maybe pointing this out in the sourceforge page would be useful for people who are not on Windows?

Optimistic about the results, I asked for a simple minded Zotero plugin to do this from Zotero, something that would replicate what you do, but taking advantage of running in Zotero. But this did not work after many attempts (with none of Claude, Gemini, or ChatGPT). This could be just me, of course. I know some Python, the logic of your code seemed really clean, and building myself a mental model of what your code does after playing with it in a Windows virtual machine was not difficult, so giving clear instructions and making suggestions to Claude/Gemini was possible for me, but I do not know how a plugin works (nor any JS), so I was lost blindly following LLMs' non-working suggestions.

ilia.leikin · February 15, 2026

@rdiaz02
Thanks for taking interest in this idea!

RefTraceback was written in AutoHotkey, which happens to be my favourite language (it is very efficient for creating small quick prototypes and productivity tools), but it is not very portable. It is basically a tool for automating Windows, and translates poorly to Linux. AI understands it quite well, but not as well as the more widespread languages.

Your approach of making it a Zotero plugin is spot-on. The user experience would be much more refined that way. I'd love to code one myself, but have zero experience with Zotero plugins. And, as you pointed out, coding with AI works best when you know how to do it yourself (roughly at least).

Maybe we can collaborate on this. I'll do some research into Zotero plugins.

tim820 · February 16, 2026

A couple of additions to this old/new discussion ...

The important settings in the Reference plugin are now in English. The github documentation remains related to the Zotero v6 version, which had a different interface and ways of working (but it's easy enough to figure out just by using it). The top section of the Settings in Chinese relates to the Chinese CNKI database.

There is a new AI-based plugin for extracting the reference list from a paper. I have only tried it with free Gemini and it kept throwing a limits-related error. YMMV. It does not try to do quite as much as the Reference plugin.
https://github.com/jmiba/Zotero-add-items-from-text

The Cita plugin was also trying to do reference extraction from early on in its development but I haven't looked at it recently, as its main aims were wider than that (sending data to Wikidata etc). I probably should look at it again as it looks to have matured a lot.
https://github.com/zotero-cita/zotero-cita?tab=readme-ov-file

ilia.leikin · February 16, 2026

Thanks @tim820! Will try these out.

AI can definitely be helpful here.

Part of the plugin I'm suggesting is the introduction of a new kind of object to Zotero - we could call them "Searchables", "Bookmarks", "Ghost references", "Pre-references", etc. - things you're interested in, but haven't yet searched, evaluated and added to the database. E.g. a whole lot of references you've extracted from an article or a thesis you've been reading.
Because the moment you find an interesting reference and the moment you decide to add it to the Zotero database are distinct, and could be separated by statistically significant amounts of time and effort :)

This of course could be done outside of Zotero, in a third-party tool. Keep a neatly organised list of references, search them when you have the time, and add to Zotero via browser connector.

rdiaz02 · February 16, 2026

Thanks both (@ilia.leikin, @tim820) for the additional details!

I tried https://github.com/zotero-cita/zotero-cita?tab=readme-ov-file a couple of few days ago. For the papers I've tested, it works great: it can query a range of external DBs and gets the reference right, which I think is a great way to get around the difficulties of parsing possibly complex documents. (I did not look in much detail to the rest of the extensive functionality). But, at least for me, it is not exactly what I am looking for, as it places the references as one or more notes, not as a clickable list of "in my zotero/not in my zotero, go grab it". Edit: this is only partially correct. Cita also puts the citations on a menu on the right, and you can also ask for it to "Auto link citations with Zotero items", so you can see which ones you have in your library and jump to them, or click on the magic wand for those not on your library to import them.

I didn't know about https://github.com/jmiba/Zotero-add-items-from-text . I just tried it briefly; I hit rate limits with free Gemini and could not get it work with DeepSeek, OpenAI, or Anthropic (timeouts or network errors). I did not try much, though, because it requires users to select the references and paste them in a box for processing.

In fact, playing around with Ilia's program (and unsuccessfully trying to vibe code a minimal plugin that would parse the references) made me realize more clearly what it is I am looking for and why (I am currently reading about fields that are new to me, and the "reference hoarding" is becoming a sticking point). I'd like to be able to mark/click on one or more references in a PDF (or epub or snapshot), find if they are in my Zotero library (if they are, be given the option of going there) or, if they are not, search for them in, say, arXiv, or OpenAlex, or Semantic Scholar.

Why? When I read papers I often already mark, in the pdf itself, what I want to check later. Having a separate list on the side of the PDF is, for me, cumbersome: I need to match the items in the PDF to the items on another list. Moreover, I might be interested in checking just a handful of references, but the full list might contain tens or hundreds.

The "Search selected text on google scholar" script from Actions and Tags (https://github.com/windingwind/zotero-actions-tags/discussions/535) is very neat and already does some of this: you can select text, and it searches in Google Scholar. I'll see if I can try to use Actions and Tags (with Claude's help) to build upon this to carry out the more elaborate procedure above and to search in arXiv, OpenAlex, Semantic Scholar, etc

Ideally, I'd like to just put the cursor in any line of a reference, and have code that would automagically select just the right amount of info (extending left and right as needed) for the search in my library and in external dbs to be as successful as possible. But, for now, that might be way too much. When playing with this yesterday, I found that correctly parsing references from possibly very different kinds of documents can be much more complicated than I naively expected, which probably explains efforts such as grobid (https://github.com/grobidOrg/grobid), Neural ParsCit (https://github.com/WING-NUS/Neural-ParsCit) and others (e.g., https://arxiv.org/abs/2205.14677 , https://arxiv.org/abs/2505.15948), as well as https://github.com/jmiba/Zotero-add-items-from-text .

rdiaz02 · February 16, 2026

This seems to work:
https://github.com/windingwind/zotero-actions-tags/discussions/568

to do "I'd like to be able to mark/click on one or more reference in a PDF (or epub or snapshot), find if it is in my Zotero library and, if present, display the PDF > epub > snapshot > item or, if it is not, search for it in, OpenAlex > Semantic Scholar > arXiv > PubMed > CrossRef > Google Scholar (this can be modified, as well as how many hits to show).

(I decided against searching for more than one reference at a time, because I realized I would find that confusing, especially with the small latency until results are shown and if there were mistakes in the search).

rdiaz02 · February 17, 2026

And coming back to some of @ilia.leikin's comments,

a whole lot of references you've extracted from an article or a thesis you've been reading.
Because the moment you find an interesting reference and the moment you decide to add it to the Zotero database are distinct,

Cita, https://github.com/zotero-cita/zotero-cita, might already provide much of the code to do that: it leaves all the references as one or more notes, so one could either parse the notes or "Generate Report from Items" and parse the report. Getting all the references should be easy since most (all?) have a URL and a DOI.

Edit 2: Actually, Cita does a lot more! It can "Autolink citations with Zotero items" (on the Cita menu on the right, click on the three horizontal dots). So there we have it: the list of references, which ones are in Zotero, and for those that are not the option to import into the collection of choice!
(Is the zotero-actions-tags/discussions/568 script above therefore redundant? I do not think so. It can work for non-indexed references, including internal reports, student papers, etc. And it solves another issue ---at least for me---: getting to the reference while directly reading the possibly annotated PDF, instead of switching to the right to locate the reference. And Cita's reference list depends on upstream extraction quality ---Crossref, OpenAlex, and Semantic Scholar sometimes have errors or missing entries--- whereas the script always works from the actual text in the PDFBut certainly, I could make better use of Cita).

Edit: Related discussion in https://forums.zotero.org/discussion/126880/help-finding-a-plugin-for-importing-references-from-in-text-citations .

ilia.leikin · February 19, 2026

@rdiaz02, thanks for doing this, it is shaping up really well!
I'll try out your script, and check out other projects you indicated.
Please let me know if you need some extra testing or whatever.

rdiaz02 · February 19, 2026

@ilia.leikin : the more I think about it, the more I think that Cita , https://github.com/zotero-cita/zotero-cita , already provides most of what I think you wanted. I'd like to make use of Cita's output for better extraction when on a specific PDF entry (I've written about it in "Extension: leverage Cita" in https://github.com/windingwind/zotero-actions-tags/discussions/568) but for what I understand you want, Cita I think would be the best. (Not perfect, though: I found sometimes it misses some citations, but that is a problem of the provider).

(My mistake for not having looked at Cita more closely earlier; it was not until @tim820 mentioned it that I got back to playing with it more carefully and even then, I was slow seeing several of the things it can do ---part of it might be related to the outdated documentation, so the best is to play with it and take a look at the issues, both open and closed, in github).