Backup of large database

So I have finally decided to subscribe to unlimited storage for my Zotero files. This is in part to replace my (a little more complicated) setup with backing up my files automatically with dropbox. I know using dropbox is strongly discouraged as there is a risk of file corruption, but I figured I was safe because I never accessed my data on any other device, and it would be possible restore data file from an older version from dropbox, in case the most recent one got corrupted.

But now that I expect to use zotero from different devices, I decided to abandon this setup, and sync all my data on zotero servers. The issue I now have is that Zotero guidelines are to not rely on syncing as backup of files. This is little bit of a problem for me because with the way zotero database is set up, my data results in about 300,000 files (for about 8000 database entries) which literally take hours to copy/backup. What is the recommended method for backing the zotero database, when you have a large database?

(On dropbox this was manageable once the initial sync was completed, as only a small number of files would need to be updated after that. On OneDrive, which my workplace offers for free, I can't even get the initial backup to work because it freezes due to too many files being saved at the same time. And as I said, copying the the database to another drive on the computer takes several hours)

Any suggestions would be much appreciated.
  • edited August 15, 2021
    The recommended method is to not think about Zotero at all and just back up your entire computer using a tool that does automated, incremental, full-system backups. There's no reason to be thinking about backing up individual programs.

    Online Zotero storage will help you if, say, your computer dies (and it has saved many people's libraries), but it's technically not a proper backup because there's no version history. So if you make a change you didn't mean to make and then sync, you can't restore it (not counting the automatic backups Zotero keeps for a couple days in the local data directory).
  • Thanks for the explanation. I have tried incremental backup systems before, but I didn't find them them user friendly. I found it much more convenient to just save everything in dropbox. But that doesn't work for Zotero. Maybe I need to look for a better backup software (anyone have any recommendations for Windows 10?), and and manually back up the main zotero file every couple of days for now. Thanks.
  • Yes, you need better backup software. I don't know what people use on Windows, but this has been a solved problem on Macs for almost 15 years — you plug in an external hard drive and Time Machine does the rest for you, making incremental backups every hour. You never have to think about it again unless you need to recover something (other than plugging in the drive occasionally if this is a laptop). There's just no reason anyone should be worrying about backing up individual programs' files in 2021.
    I found it much more convenient to just save everything in dropbox. But that doesn't work for Zotero.
    To be clear, this isn't really a limitation of Zotero. It would be a problem for any program that uses a database. Dropbox and other cloud storage tools are meant for files, not databases. (You can of course back up anything to Dropbox, including database files — you just can't use them directly from there without corrupting them. If you really don't want to back up your entire system, you can use a backup tool to incrementally copy specific files to a given folder on a schedule, and set that to copy the Zotero data directory to Dropbox.)
  • Window's current built in backup is functionally equivalent to Mac Timemachine
  • Yes, I'm looking at other backup systems. I don't trust the built in windows backup functionality at all. I had a terrible experience with it 4-5 years ago, where I lost files as a result. (BTW, I think more and more people are used cloud sync for back up these days instead of old fashioned backups).

    Not saying it is a limitation of Zotero. But the way the database is set up (thousands of small files, rather than fewer larger files) makes it take a very long time to copy and transfer files. Maybe there is no other way.

    While we're on the topic of files: why is that the sqlite file of my original zotero installation is 740 MB, but when I synchronize the entire database (with all info, data, and attachments) to another computer, the sqlite file on the new computer is only 180 MB?
  • But the way the database is set up (thousands of small files, rather than fewer larger files) makes it take a very long time to copy and transfer files. Maybe there is no other way.
    Those aren't the "database" — those are attached files. PDFs obviously need to be separate files. Older webpage snapshots can involve of individual files, but last year we switched to a new snapshot-saving method that saves snapshots as single files.

    In any case, this just isn't a problem with a proper incremental backup system like Time Machine.
    why is that the sqlite file of my original zotero installation is 740 MB, but when I synchronize the entire database (with all info, data, and attachments) to another computer, the sqlite file on the new computer is only 180 MB?
    Probably a combination of a couple things:

    1) The other computer hasn't done full-text content indexing yet. If you have full-text content syncing enabled on both computers, it should do it automatically in the background. You can force it to index unindexed items from the Search pane of the preferences, though if you're using "as needed" file-syncing mode that won't work (which is why full-text content syncing exists).

    2) New databases can be smaller due to decreased fragmentation. If you check your database integrity from the Advanced → Files and Folders pane of the Zotero preference, it will currently perform a cleanup operation on the database that might decrease the size a bit. (This could take a while on a 700 MB DB, so be prepared to let it go if you do this.)
  • Thank you. It's good to know that snapshots are now saved as individual files.
Sign In or Register to comment.