Dropbox - a prompt for a solution
This is not to just "stir the pot," but I am hoping to stimulate some positive discussion and possibly some movement toward some solutions other than the ones I've seen posted here. This is mostly for programmers and developers, but other creative, adventurous folks might be inspired to experiment with alternative solutions.
I cannot vouch for the validity of the dire warnings that Dropbox will "almost certainly lead to data corruption in time." I use Linux Mint (an Ubuntu derivative) and have not had data corruption in more than 7 years of use, including a total of 5+ years of grad school. I am a moderate user, with my current 5.5 year old database holding a bit over 1250 entries and about 2.2 gB of storage in use (I "restarted" when changing degree programs!).
So is the corruption problem limited to one operating system? Do Mac OS users have problems? Or is it just on Windows? Or just networked (e.g. NFS) files? Just more recent versions of the OS or Zotero? Is it fixed in the latest version 5? (assuming *not* from the new warning message). Does the same thing happen if you put it in a Google Drive synced folder?
The corruption (I am assuming) is due to the database file (zotero.sqlite) being written / overwritten by Zotero and/or Dropbox sync app while the database is in use (open). I have only used SQLite in web applications (which are of a different "state machine" design) and have never had a corruption after millions of page views (which involves opening. reading and writing the database on each page view). So I don't think SQLite itself is the issue.
A search for "Dropbox corrupting files" does show some other issues similar to this, but does not seem to be a dire problem, where other programs' files are "almost certainly" corrupted. I also found this useful diagnostic procedure from Dropbox: https://www.dropbox.com/help/security/contact-support-missing-files
Is it possible for Zotero to lock the database file to prevent other programs from writing to it? Are there other steps that could be taken to minimize the chances of corruption? See https://www.sqlite.org/howtocorrupt.html for more details. I see there is a temporary sqlite-journal file, does that give more possibilities for preventing/fixing problems?
My experience as a programmer in different areas and my current life situation do not afford me a good place to tackle this problem myself. I don't have a good answer, but I do believe there are better answers than what I've seen. I know most folks here are not programmers, so may be limited in their contributions, but we can all contribute.
A solution is possible! Constructive ideas welcomed.
I cannot vouch for the validity of the dire warnings that Dropbox will "almost certainly lead to data corruption in time." I use Linux Mint (an Ubuntu derivative) and have not had data corruption in more than 7 years of use, including a total of 5+ years of grad school. I am a moderate user, with my current 5.5 year old database holding a bit over 1250 entries and about 2.2 gB of storage in use (I "restarted" when changing degree programs!).
So is the corruption problem limited to one operating system? Do Mac OS users have problems? Or is it just on Windows? Or just networked (e.g. NFS) files? Just more recent versions of the OS or Zotero? Is it fixed in the latest version 5? (assuming *not* from the new warning message). Does the same thing happen if you put it in a Google Drive synced folder?
The corruption (I am assuming) is due to the database file (zotero.sqlite) being written / overwritten by Zotero and/or Dropbox sync app while the database is in use (open). I have only used SQLite in web applications (which are of a different "state machine" design) and have never had a corruption after millions of page views (which involves opening. reading and writing the database on each page view). So I don't think SQLite itself is the issue.
A search for "Dropbox corrupting files" does show some other issues similar to this, but does not seem to be a dire problem, where other programs' files are "almost certainly" corrupted. I also found this useful diagnostic procedure from Dropbox: https://www.dropbox.com/help/security/contact-support-missing-files
Is it possible for Zotero to lock the database file to prevent other programs from writing to it? Are there other steps that could be taken to minimize the chances of corruption? See https://www.sqlite.org/howtocorrupt.html for more details. I see there is a temporary sqlite-journal file, does that give more possibilities for preventing/fixing problems?
My experience as a programmer in different areas and my current life situation do not afford me a good place to tackle this problem myself. I don't have a good answer, but I do believe there are better answers than what I've seen. I know most folks here are not programmers, so may be limited in their contributions, but we can all contribute.
A solution is possible! Constructive ideas welcomed.
The low-level SQLite stuff mostly isn't relevant, since we use Mozilla's SQLite subsystem, which (if I recall) does its own sort of caching and expects the file not to have changed. If you were just using the SQLite command-line client or a program that accessed the database differently, you probably wouldn't see this.
But that's all irrelevant anyway, because even if we could ensure that corruption didn't happen at the file level, Zotero would still break in random, unpredictable ways when the underlying data changed from its in-memory representation. Zotero sets locking_mode=EXCLUSIVE on the file to prevent it from being changed by other SQLite clients, but Dropbox and other sync tools don't know anything about that. So there's no point debugging file-level corruption issues, because Zotero simply isn't designed to expect that the data in the database can change out from underneath it. I suspect it's the same for most programs that use SQLite as an application file format (which is different from a standard stateless database access pattern).
And even if all of that wasn't an issue, Dropbox just isn't a good place for a database, because the only conflict resolution mechanism is at the file level. If Zotero took out an exclusive file-level lock that prevented Dropbox from writing to it, if you had Zotero open on two machines at once, whenever one lock was lifted you'd have two in-conflict files that you had no way of reconciling, and you'd have directories in 'storage' that were only referenced in one of the two databases.
This just doesn't work.
I have a computer at home and at work, each with Zotero stand alone. In each case, the Zotero folder is at C:\Users\\Zotero. I store copies of my files in C:\Users\\Zotero\storage, so there is a complete set at home and at work. To keep the two in sync, I use Create Synchonicity to backup C:\Users\\Zotero to a USB after each session, and then backup from the USB to the other computer.
Obviously, Dropbox would be a better solution, but I have avoided this because of the warnings discussed above. However, I think I am in the position dstillman mentions of only ever having Zotero open on one machine at a time. In that case, he suggests, I could use Dropbox safely. Is this correct?
In other words, when I am working on my work computer, Dropbox will not change change the version I am working on (because the home version is not open), but will copy my changes to the cloud, and then recopy them to the home computer when it comes on. After ensuring that Dropbox has finished updating, I can then safely use Zotero on the home computer. Is this correct?
@franzi.poldy, you could use the linked attachment feature and store all your linked attachments in Dropbox (or any other cloud-based file sync tool).
Allowing it to be separate from the database creates a significant cause for additional problems (i.e. the database needs to know where it is on every synced computer, which is of course feasible but is going to cause lots of both tech and human error we have to then troubleshoot).
But would be useful to many people and solves the Dropbox type issue for users who do not use linked attachments and do not wish to buy Zotero storage. I agree it's one more path to remember, but it's not different or more work than the base path for relative linked attachments (which can be different on each computer).
Why are linked attachments on Dropbox safer than stored attachments? I opted for stored attachments because it seemed easier and safer to keep everything together in one place, the Zotero folder, which can be moved (and backed up) as a single unit. Perhaps that was wrong. If I were to change now, could I make the change globally, or would I have to deal with each item separately?
Storing Zotero local database in the cloud is dangerous.
The easiest way to use Zotero is just to use stored attachments and sync files using the Zotero file storage. This requires the least setup and also has the advantage that the attachment files will be viewable on the zotero.org web library interface. (This is also the only way to sync attachment files in groups due to technical reasons.)
A second option for syncing attachment files in My Library is to use stored attachments with a WebDAV-enabled cloud server (e.g., box.com). This is also fairly easy to set up (you just enter your WebDAV account information in the Zotero preferences), but you can't view your attachments in the zotero.org web interface or use this to store group library files.
If you really want to use Dropbox, you can use linked attachments instead of stored attachments. You set the links to point to the PDF files stored in Dropbox. You can use the Zotfile plugin to automate this process. This is safer than storing the entire Zotero database file and folder in Dropbox because the only things being stored in the cloud sync folder are the PDFs themselves. From Zotero's perspective, it is only storing the file path for the linked file, so nothing that Dropbox does touches anything that Zotero pays attention to.