How can I extract references from papers?
I've just started using Zotero and am in the middle of doing a literature review. Is there a way in which I can automatically transfer all the references used by a paper into my Zotero library?
Upgrade Storage
If you're asking whether you can move them all to a collection, you can do that with Reference Extractor. There will be a built-in way of doing that in the future.
I hope this makes sense
https://github.com/MuiseDestiny/zotero-reference
Unfortunately the current documentation on github is out of date (only for the older-style, v6-compatible version). And the plugin's Settings dialog in Zotero is only in Chinese. But Chrome's inbuilt translation does a reasonable job on the github pages. And the rest is fairly intuitive (if necessary, Google Lens does a decent job of translating the Settings dialog). Someone has just added an Issue for English settings to be added (the older, v6-compatible version had that).
https://github.com/MuiseDestiny/zotero-reference/issues/355
But there's nothing in the Settings you really need to set in order to make it work (except for Chinese paper access via CNKI for those who need that). And it's so powerful that it is worth the effort to figure out how it works.
It can also show you a list of the papers that cite the current paper (with data from Semantic Scholar; for which a [free] API key is not required, but can be used if you get one). So the plugin's two display modes are References (that the paper cites) and Citations (that cite the paper).
P.S.
just to confirm: is this the Settings dialog which you are referring to?
https://s3.amazonaws.com/zotero.org/images/forums/u17176444/mrb9l91m8q4hs11l9guy.png
https://s3.amazonaws.com/zotero.org/images/forums/u5906489/9icja67g6ss1v7l762ir.png
The old plugin version functions quite differently. It can still extract reference lists (but not get the cited-by list); but it does that in different ways. The new version is cleaner and more powerful (albeit with the only-Chinese Settings). On the plus side, the current github documentation (translated by Chrome for example) describes the old version. So that version may still be of use to you.
https://s3.amazonaws.com/zotero.org/images/forums/u17176444/qvhh39qybdxq7ytyi7z3.png
https://sourceforge.net/projects/academic-tech-toolbox/
It's clunky but can save some time if you're researchin' hard.
Basically, it allows you to bulk-search the refs (you can extract the bibliography and select the refs that look interesting), and then you can save just the ones you need using the Zotero extension in you browser.
I've been meaning to develop this into a Zotero plugin, but have zero knowledge about how to do that. Any feedback/help welcome!
v0.5.9 (June 2023) was the first version to be compatible with Zotero v7. AFAIK the Settings have only been in Chinese since then, but my memory could be wrong about that. They have only been in Chinese for some time.
https://github.com/MuiseDestiny/zotero-reference/releases?page=1
https://github.com/MuiseDestiny/zotero-reference/issues/355
*The actual parameters in the Settings of your old dialog are different to the parameters that can be set in the new version. So it's not just a case of an English translation of the same settings having been available in the past but not now. The old and new plugins are functionally different.
Just in case it matters. I am on Linux, but I decompressed the zip file for RefTraceback, asked Gemini and Claude to turn it into something usable under Linux (pdftotext is readily available in most Linux distros) and, after one iteration, Claude (just plain Claude via web interface ---Claude Sonnet 4.5 Extended---, not Claude Code) produced a Python program that seems to closely match what yours does in Windows (with Gemini, what I used initially, I/"we" weren't able, after more than 10 iterations, to get a decently working thing). I can of course send you the code. I guess it should also run under Windows (not tested as I don't have Python installed in my Windows virtual machine).
As you say, it requires modifying workflow; but maybe pointing this out in the sourceforge page would be useful for people who are not on Windows?
Optimistic about the results, I asked for a simple minded Zotero plugin to do this from Zotero, something that would replicate what you do, but taking advantage of running in Zotero. But this did not work after many attempts (with none of Claude, Gemini, or ChatGPT). This could be just me, of course. I know some Python, the logic of your code seemed really clean, and building myself a mental model of what your code does after playing with it in a Windows virtual machine was not difficult, so giving clear instructions and making suggestions to Claude/Gemini was possible for me, but I do not know how a plugin works (nor any JS), so I was lost blindly following LLMs' non-working suggestions.
Thanks for taking interest in this idea!
RefTraceback was written in AutoHotkey, which happens to be my favourite language (it is very efficient for creating small quick prototypes and productivity tools), but it is not very portable. It is basically a tool for automating Windows, and translates poorly to Linux. AI understands it quite well, but not as well as the more widespread languages.
Your approach of making it a Zotero plugin is spot-on. The user experience would be much more refined that way. I'd love to code one myself, but have zero experience with Zotero plugins. And, as you pointed out, coding with AI works best when you know how to do it yourself (roughly at least).
Maybe we can collaborate on this. I'll do some research into Zotero plugins.
The important settings in the Reference plugin are now in English. The github documentation remains related to the Zotero v6 version, which had a different interface and ways of working (but it's easy enough to figure out just by using it). The top section of the Settings in Chinese relates to the Chinese CNKI database.
There is a new AI-based plugin for extracting the reference list from a paper. I have only tried it with free Gemini and it kept throwing a limits-related error. YMMV. It does not try to do quite as much as the Reference plugin.
https://github.com/jmiba/Zotero-add-items-from-text
The Cita plugin was also trying to do reference extraction from early on in its development but I haven't looked at it recently, as its main aims were wider than that (sending data to Wikidata etc). I probably should look at it again as it looks to have matured a lot.
https://github.com/zotero-cita/zotero-cita?tab=readme-ov-file
AI can definitely be helpful here.
Part of the plugin I'm suggesting is the introduction of a new kind of object to Zotero - we could call them "Searchables", "Bookmarks", "Ghost references", "Pre-references", etc. - things you're interested in, but haven't yet searched, evaluated and added to the database. E.g. a whole lot of references you've extracted from an article or a thesis you've been reading.
Because the moment you find an interesting reference and the moment you decide to add it to the Zotero database are distinct, and could be separated by statistically significant amounts of time and effort :)
This of course could be done outside of Zotero, in a third-party tool. Keep a neatly organised list of references, search them when you have the time, and add to Zotero via browser connector.
I tried https://github.com/zotero-cita/zotero-cita?tab=readme-ov-file a couple of few days ago. For the papers I've tested, it works great: it can query a range of external DBs and gets the reference right, which I think is a great way to get around the difficulties of parsing possibly complex documents. (I did not look in much detail to the rest of the extensive functionality). But, at least for me, it is not exactly what I am looking for, as it places the references as one or more notes, not as a clickable list of "in my zotero/not in my zotero, go grab it". Edit: this is only partially correct. Cita also puts the citations on a menu on the right, and you can also ask for it to "Auto link citations with Zotero items", so you can see which ones you have in your library and jump to them, or click on the magic wand for those not on your library to import them.
I didn't know about https://github.com/jmiba/Zotero-add-items-from-text . I just tried it briefly; I hit rate limits with free Gemini and could not get it work with DeepSeek, OpenAI, or Anthropic (timeouts or network errors). I did not try much, though, because it requires users to select the references and paste them in a box for processing.
In fact, playing around with Ilia's program (and unsuccessfully trying to vibe code a minimal plugin that would parse the references) made me realize more clearly what it is I am looking for and why (I am currently reading about fields that are new to me, and the "reference hoarding" is becoming a sticking point). I'd like to be able to mark/click on one or more references in a PDF (or epub or snapshot), find if they are in my Zotero library (if they are, be given the option of going there) or, if they are not, search for them in, say, arXiv, or OpenAlex, or Semantic Scholar.
Why? When I read papers I often already mark, in the pdf itself, what I want to check later. Having a separate list on the side of the PDF is, for me, cumbersome: I need to match the items in the PDF to the items on another list. Moreover, I might be interested in checking just a handful of references, but the full list might contain tens or hundreds.
The "Search selected text on google scholar" script from Actions and Tags (https://github.com/windingwind/zotero-actions-tags/discussions/535) is very neat and already does some of this: you can select text, and it searches in Google Scholar. I'll see if I can try to use Actions and Tags (with Claude's help) to build upon this to carry out the more elaborate procedure above and to search in arXiv, OpenAlex, Semantic Scholar, etc
Ideally, I'd like to just put the cursor in any line of a reference, and have code that would automagically select just the right amount of info (extending left and right as needed) for the search in my library and in external dbs to be as successful as possible. But, for now, that might be way too much. When playing with this yesterday, I found that correctly parsing references from possibly very different kinds of documents can be much more complicated than I naively expected, which probably explains efforts such as grobid (https://github.com/grobidOrg/grobid), Neural ParsCit (https://github.com/WING-NUS/Neural-ParsCit) and others (e.g., https://arxiv.org/abs/2205.14677 , https://arxiv.org/abs/2505.15948), as well as https://github.com/jmiba/Zotero-add-items-from-text .
https://github.com/windingwind/zotero-actions-tags/discussions/568
to do "I'd like to be able to mark/click on one or more reference in a PDF (or epub or snapshot), find if it is in my Zotero library and, if present, display the PDF > epub > snapshot > item or, if it is not, search for it in, OpenAlex > Semantic Scholar > arXiv > PubMed > CrossRef > Google Scholar (this can be modified, as well as how many hits to show).
(I decided against searching for more than one reference at a time, because I realized I would find that confusing, especially with the small latency until results are shown and if there were mistakes in the search).
Edit 2: Actually, Cita does a lot more! It can "Autolink citations with Zotero items" (on the Cita menu on the right, click on the three horizontal dots). So there we have it: the list of references, which ones are in Zotero, and for those that are not the option to import into the collection of choice!
(Is the zotero-actions-tags/discussions/568 script above therefore redundant? I do not think so. It can work for non-indexed references, including internal reports, student papers, etc. And it solves another issue ---at least for me---: getting to the reference while directly reading the possibly annotated PDF, instead of switching to the right to locate the reference. And Cita's reference list depends on upstream extraction quality ---Crossref, OpenAlex, and Semantic Scholar sometimes have errors or missing entries--- whereas the script always works from the actual text in the PDFBut certainly, I could make better use of Cita).
Edit: Related discussion in https://forums.zotero.org/discussion/126880/help-finding-a-plugin-for-importing-references-from-in-text-citations .
I'll try out your script, and check out other projects you indicated.
Please let me know if you need some extra testing or whatever.
(My mistake for not having looked at Cita more closely earlier; it was not until @tim820 mentioned it that I got back to playing with it more carefully and even then, I was slow seeing several of the things it can do ---part of it might be related to the outdated documentation, so the best is to play with it and take a look at the issues, both open and closed, in github).