Beaver: Zotero AI plugin with agentic search over your library, sentence-level citations and more
Hi Everyone!
We are happy to introduce Beaver. Beaver is a research agent with native Zotero integration. What, another AI plugin for Zotero? Yes, indeed! But I think the feature set makes it compelling. There are other options and you should explore them all. Here are some of Beaver’s features:
1. Research Agent: Beaver uses agentic search over your entire Zotero library. It iteratively combines metadata, related reference search based on semantic similarity and full document search using keyword and semantic similarity (hybrid). Together, these tools allow the agent to find relevant references, documents, and even specific paragraphs.
2. Seamless Zotero integration: Beaver is a Zotero plugin that adds a sidebar in Zotero so you can use it directly from the library view or while you are reading a PDF. Beaver sees your current page and can respond based on your entire Zotero library.
3. Precise, sentence-level citations: Beaver supports sentence-level citation! Hover over a citation to see a preview or click on it to open the pdf and highlight the relevant passage.
4. Unlimited use and access to frontier models from OpenAI, Anthropic and Google with your own API key.
5. Benchmarks and evaluations. We use benchmarks and evaluations to consistently improve the performance of Beaver including prompt and context engineering, retrieval pipeline, file processing, citation behavior and more.
Preview: Beaver is in beta and available for free right now (for a limited number of users). You can learn more and sign-up here. The frontend code is open source on GitHub. Remember that Beaver is in beta and you might run into bugs!
Please leave feedback or let us know about feature requests here or on Github.
How does Beaver work?
Beaver is a cloud-based plugin. That means it syncs your Zotero data with our servers to processes your files and provide all it’s functionality. We have a strict privacy policy (no training or other use of your data unless you explicitly opt-in) and will implement additional privacy-focused features. We are also interested in developing a local version but it is not the highest priority right now (follow discussion here).
Prefer a local‑only approach? Consider some of the current Zotero plugins like A.R.I.A. or Zotero MCP.
How does library search work?
Beaver uses agentic search: the AI can choose among different search tools, filter based on metadata and iterate to explore your Zotero library. Currently, Beaver supports three search tools:
1. Metadata Search: Finds items by metadata (author, year, title).
2. Related Reference Search (Semantic): Find all references related to a specific topic.
3. Full-document Search (keyword and semantic): Beaver uses hybrid search with reranking to search the content of your documents and retrieve relevant passages. Hybrid search combines keyword and semantic search based on embeddings to find relevant passages even without exact terms.
During the preview, full-document search is free for up to 75,000 pages (~2,500 articles). After the preview, full-document search will likely be part of the paid version simply because of the cost associated with processing and storing the data. Metadata + related reference search will likely remain unlimited and free.
How does pricing work after the preview?
Beaver is published by academic researchers at Harvard. The goal is not to make a profit. We are just having fun working on this and building a useful research tool. That means two things for pricing after the preview: a) We always want to offer a free version (with your own API key) and we are trying to pack as many features into it as we can. b) There will be a paid version with additional features priced to cover costs. For example, processing thousands of files in a way that supports sentence-level citations, generating embeddings, storing the data and making it searchable is not cheap.
System Requirements
Zotero 7.0 or later (including Zotero 8 beta)
Internet connection for cloud features
Modern web browser for account management
We are happy to introduce Beaver. Beaver is a research agent with native Zotero integration. What, another AI plugin for Zotero? Yes, indeed! But I think the feature set makes it compelling. There are other options and you should explore them all. Here are some of Beaver’s features:
1. Research Agent: Beaver uses agentic search over your entire Zotero library. It iteratively combines metadata, related reference search based on semantic similarity and full document search using keyword and semantic similarity (hybrid). Together, these tools allow the agent to find relevant references, documents, and even specific paragraphs.
2. Seamless Zotero integration: Beaver is a Zotero plugin that adds a sidebar in Zotero so you can use it directly from the library view or while you are reading a PDF. Beaver sees your current page and can respond based on your entire Zotero library.
3. Precise, sentence-level citations: Beaver supports sentence-level citation! Hover over a citation to see a preview or click on it to open the pdf and highlight the relevant passage.
4. Unlimited use and access to frontier models from OpenAI, Anthropic and Google with your own API key.
5. Benchmarks and evaluations. We use benchmarks and evaluations to consistently improve the performance of Beaver including prompt and context engineering, retrieval pipeline, file processing, citation behavior and more.
Preview: Beaver is in beta and available for free right now (for a limited number of users). You can learn more and sign-up here. The frontend code is open source on GitHub. Remember that Beaver is in beta and you might run into bugs!
Please leave feedback or let us know about feature requests here or on Github.
How does Beaver work?
Beaver is a cloud-based plugin. That means it syncs your Zotero data with our servers to processes your files and provide all it’s functionality. We have a strict privacy policy (no training or other use of your data unless you explicitly opt-in) and will implement additional privacy-focused features. We are also interested in developing a local version but it is not the highest priority right now (follow discussion here).
Prefer a local‑only approach? Consider some of the current Zotero plugins like A.R.I.A. or Zotero MCP.
How does library search work?
Beaver uses agentic search: the AI can choose among different search tools, filter based on metadata and iterate to explore your Zotero library. Currently, Beaver supports three search tools:
1. Metadata Search: Finds items by metadata (author, year, title).
2. Related Reference Search (Semantic): Find all references related to a specific topic.
3. Full-document Search (keyword and semantic): Beaver uses hybrid search with reranking to search the content of your documents and retrieve relevant passages. Hybrid search combines keyword and semantic search based on embeddings to find relevant passages even without exact terms.
During the preview, full-document search is free for up to 75,000 pages (~2,500 articles). After the preview, full-document search will likely be part of the paid version simply because of the cost associated with processing and storing the data. Metadata + related reference search will likely remain unlimited and free.
How does pricing work after the preview?
Beaver is published by academic researchers at Harvard. The goal is not to make a profit. We are just having fun working on this and building a useful research tool. That means two things for pricing after the preview: a) We always want to offer a free version (with your own API key) and we are trying to pack as many features into it as we can. b) There will be a paid version with additional features priced to cover costs. For example, processing thousands of files in a way that supports sentence-level citations, generating embeddings, storing the data and making it searchable is not cheap.
System Requirements
Zotero 7.0 or later (including Zotero 8 beta)
Internet connection for cloud features
Modern web browser for account management
Upgrade Storage
And yes. Group libraries and OCR support are probably on top of the list in terms of next features. Both are pretty far.
Please let me know if you run into any issues.
I wanted to mention that I couldn't use the download button on the Beaver webpage on Firefox? "Save Link as" does not show up when right-clicking on the Download button, and the download triggered gets rejected by Firefox, since the .xpi file is not for Firefox (obviously). Had to switch to Chrome to download. Will be trying it out soon.
The Firefox download issue should have been resolved earlier today (4-5 hours ago) so I hope it's working now. Please let me know if you tried after that and still had the same issue.
One thing that I've noticed that some journal articles may be skipped due to identification as 'insufficient text' and was wondering how to go around it?
https://s3.amazonaws.com/zotero.org/images/forums/u13814293/k5gm55tvzpfcsy6xxp2e.png
Furthermore, will Beaver be considering to include API key functions from Openrouter or Ollama (free functionality compared to the 3 major ones).
On the PDFs: Would you be open to emailing me one of the pdfs? Either at contact@beaverapp.ai or my own address. If not, I can talk you through trying some things. As background: Beaver skips attachments when the extracted text is less than 150 characters. In your case, I don't think that is the case. My guess is that these PDFs do not have a text-layer and require OCR but are not recognized as such and therefore misclassified as "insufficient text" and not "Requires OCR". If OCR is the issue (and I am only guessing), support for that is coming but will still take a little.
On supporting other providers: yes. I thought about Openrouter (because it covers so many models) and mistral (as a European provider) next. I will revisit after finishing a bigger change that is currently in the works. I do want to say that the frontier labs still have an edge for agentic applications so the differences you see might be bigger than in simple question-response applications.
-On the top, to the left of the user account symbol is a symbol of stacked coins or a cake with dripping icing (??), but hovering over it does not show a title and clicking it does not do anything. Can you explain what this is supposed to do?
-In my case, it did not scan something like 1700 pdfs because they were above the limit. That is fine as a limit, but it would be helpful if Beaver told me which ones are not included, or better, the logic of which ones are not included (does it run alphabetically per author name? Or start with the smallest files? or by date added?).
Ideally, it would be even better, if the user could give Beaver a logic to scan. In my case for example, not all my items are in folders, but all those that are in folders are more important to be scanned than those that are not. Thus scanning items in folders, follwed by those not in folders would be way more helpful than alphabetically. Or alternatively, scanning those added to the library most recently.
best
m
- Large libraries: Yes, absolutely. I think the support for large libraries (about 5% of users right now) and how the limit is handled is not good right now. I am considering a special Beaver collection like "My Publications" or "Duplicate Items" (maybe with a an option to add all recent items or add by collection during onboarding). Not sure yet about the best implementation though but I will prioritize this after the next release.
- Processing status: You should already be able to see which files were skipped or failed to process with clear reasons. You can also list them but with 1700 files that is a mess and pretty unusable to get an overview.
https://s3.amazonaws.com/zotero.org/images/forums/u3113719/evem98e1vctfbj48lq5d.png
sorry for late reply, but regarding the processing status: Can you tell us what the default processing logic is at the moment? I am able to to see the ones omitted but it is impossible for me to understand what the logic is from looking at them, and without knowing the logic of exclusion it is hard to make sense of the results of Beaver.
I added more details here so there is a dedicated documentation page that explains the process. Let me know if there are any open questions or anything unclear on that page.
The short version:
- Files are roughly processed by modification date of the Zotero attachment (only after ~Oct 20, no clear sorting before)
- We increased the free page balance during beta from 75k pages to 125k pages (over 4,000 articles at 30 page per article). This increase improves support for large libraries. Processing that many files can not and will not remain free after the beta.
- Support for very large libraries: When your library exceeds 125k pages, support during the beta is limited! Additional files are not processed. You can list unprocessed files but there is no good way to get a complete overview if there are a lot of files. I am open to suggestions here that would make the experience better but remember that this is (hopefully) only temporary during the beta.
- Beaver supports restricting sync to specific libraries. At least right now, collection-level filtering is not planned. It adds a lot of complexity and degrades the user experience. Instead, the goal is to support extremely large libraries so you don't have to worry about granular controls. I recognize that we initially planned to support this feature and changed plans after thinking about different implementations. Still open to input and suggestions here.
-First of all, to help other users: To restrict and/or expand the libraries that are scanned, go to "settings" (which is under the user account, on the top right.
-on the github help page that you link above, it says: "If you signed up before this change, files that were not processed because of insufficient balance should be processed automatically when you sent a chat message." I am one who signed up before Oct 20th. So what do I need to do? Send any *any* chat message in Beaver? Or a specific prompt to rescan the library?
-I cannot speak for others, but I would imagine that it is good in principle that beaver operates on the level of a whole library. But then for those with large libraries, if Beaver has not scanned everything, then it would be preferable to operate on defined collections.
many thanks for your help!
mm
2. Yes. Any message will do it. It should have already happened if you send any message in the last 3 weeks. If not, email the address on the help page and we will get it resolved. This is only relevant for users who signed up more than three weeks ago and ran into the page limit issue.
3. Yes, completely agree. However, there are real tradeoffs. We explored it and decided against it for now. That might change and is a temporary issue during beta.