Problems associated with large Zotero database

I am using a Zotero (v. 5.0.74) database with 13k+ entries on a Mac (OS 10.14.6 (18G95)). The zotero.sqlite.bak file is 2.1GB. I am also using betterbibtex (BBT, v. 5.1.139) in connection with RStudio (Version 1.2.1335) and the citr package (0.3.2).

In general, the large Zotero database has caused trouble, whenever citeproc-js is involved because the database is "too large" (so I am told), for example if I want to use the citation features in Zettlr (https://github.com/Zettlr/Zettlr). Exporting my database to a bibtex file (using BBT) takes a considerable time (30-40 minutes), even though I have omitted abstract, note, file, tag, attachment from the export and results in a large .bib file with 6.3MB).

In order to speed up the integration of Zotero with RStudio using citr (which is simply awesome), I wondered if there is any way to slim down my Zotero database (outside of the obvious, such as deleting duplicates, which I have done, or deleting entries, which is not really an option)?

For example:

- I have a lot of attachments, some of the large > would it help not to have an index? That would of course limit the functionality of Zotero and search, but as a trade-off to increase speed in the integration it might be worth doing. Right now I have 322k+ words there in the cache according to the search/index tab in Zotero.

Any suggestion? Or am I simply left with no options here? Thanks for your support!
  • citr unfortunately really just is very inefficient by trying to load your entire database bib file into memory at once. I recommend using rbbt instead to integrate RStudio with BBT.
    https://github.com/paleolimbot/rbbt
  • 30 minutes is long - sounds to me like your export is not using the BBT cache. Changing BBT preferences will drop the cache, but that should refill with subsequent exports, so it shouldn't remain slow. For exports where you have selected to export attachments (regardless of whether you have the attachments field removed), the cache is always off.
  • I have the same issue with large database.
    rbbt is fast, but this package does not seem to send references to the references.bib file associated with the project. Is there a way with rbbt to automatically share inserted references to the .bib file?

    Or what would be the work flow with multiple contributers and rbbt?
  • edited 12 days ago
    Recent changes in BBT should substantially speed up large exports if you don't also export attachments.
  • (the issue that kicked this off went from 17 minutes to 6-15 seconds)
  • I appreciate that things have improved considerably.

    What is nagging me is that loading Zotero to citr takes ~2 minutes, also when reupdating.

    A typical use case is that I am writing in Rmarkdown. I then realise i need to cite a paper not in my library. I add a paper Zotero via browser. To see that paper in citr, I need to reconnect the Zotero library, which means 2 minutes of waiting to use R.

    My library is 4100 items.

    Where can I check if sync attachments? Right now i see that I ignore "abstract"
  • 2 minutes, for 4100 items? That's not right. You can open an issue on github for BBT for that. citr doesn't request attachments so that should be a lot faster than you're seeing now.
  • I think this issue describes the situation? or should I open a new one?

    https://github.com/retorquere/zotero-better-bibtex/issues/1391
  • Of course you're right. I'll get a new debug build started, if you could join that issue, we can take a look at what's going wrong for you.
  • edited 12 days ago
    The size of the attachments or their indexing status doesn't matter at all, but the number of attachments does. Recent versions of BBT have made improvements in handling libraries with many attachments.
  • even though I have omitted abstract, note, file, tag, attachment from the export
    This won't make a speed difference. On a cold cache it will actually be slightly slower, because the field stripping happens after they've first been generated, but after that, the results are just cached and returned, no work performed.
  • Is there a way with rbbt to automatically share inserted references to the .bib file?
    You can use BBT auto-export to keep the file updated. Otherwise, rbbt can pull-request the bib file at any time it wants.
Sign In or Register to comment.