Zotero sync is not keeping storage in sync

I am syncing one library between two computers using Zotero sync. The Zotero storage folder on one machine has 4,606 more files in it than the one on the other computer. I used a file sync program (GoodSync) to compare the folders and found that 2,707 files are conflicted, having been changed on both machines without being reconciled, 5,026 files appear in only one or the other storage folder, 416 files have the same length but different time stamps, and 8,148 files in total need to be synchronized. Many of the conflicted files are ".zotero-ft-cache" files, or components of stored web pages such as .png and .gif and .js files.

The library on each machine seems to complete automated and manual syncing without complaint. Both have been checked with the Zotero Check Database Integrity tool and no errors were found.

I used the Storage Scanner plug-in and neither storage folder shows any broken files, only a few genuine and false-positive duplicates that I cleaned up before the analysis above.

Because of the above, I am considering shifting the syncing of my storage to the GoodSync real-time tool I use for other file syncing. I would like to clean up the storage folder first so I am not perpetually syncing obsolete files.

Q1: Why is Zotero not properly syncing the libraries?

Q2: How do I clean up the mess?
  • Are you actually experiencing a problem in Zotero? If you are, that would be the thing to report. In the absence of a problem, there's not really a need to worry about the contents of the data directory.

    .zotero-ft-cache files are internal files used for full-text indexing. Many auxiliary files would have different timestamps on different computers — that's entirely expected and normal. There can be additional 'storage' folders on one computer or the other for various reasons. A future version of Zotero will automatically clean up folders that aren't associated with entries in the database, but they wouldn't have any effect your usage of Zotero.
  • With 59 GB in my library, it's hard to know if things are 100% fine or if there are thousands of broken links I just haven't clicked in a while. You do understand that data integrity is a huge deal for people who are archiving data, and that having synced folders that do not contain the same files is contrary to the definition of the word. That Zotero does not currently "clean up" its storage folder or identify broken links or duplicate files, or resolve thousands of unsynced files is concerning.

    As I explained above, I am looking to shift to an alternate method of syncing the storage folders. My method will truly sync the folders so they both end up with identical contents. I believe this is what Webdav server would do, and know it is what Dropbox would do. For the first sync, I will have the option of overwriting older files with newer ones, or making copies with different file names. I will also have the opportunity to delete duplicates. I'm concerned about the consequence of a true sync since discovering that Zotero does not do a true sync. That is the discussion I am trying to have.

    Asking you to explain how Zotero syncs and how it tolerates unsynced files, and what options I should choose so as not to break links nor needlessly swell my storage folder with duplicates.
  • edited September 28, 2018
    Zotero doesn't sync the storage folder the way Dropbox does. It syncs the database and the files associated with the database, which as Dan explains may lead to somewhat different behavior than with a pure file sync app. As long as this doesn't affect functionality, neither of these two methods is somehow "truer" than the other.

    As for functionality, you raise three issues:
    1. Broken links. You haven't found any and Zotero does scan for broken links (that's the blue vs. hollow dot in the middle panel). This is clearly the most critical issue and you should be reassured by that.

    2. Orphaned files. These are due to real bugs. I'm not sure if there are other past reasons they exist, but one known issue is that for some time removing/leaving a group did not remove the attached files (this has since been fixed). Two things to note about this though:
    a) This is an annoyance because it costs disk space, but it doesn't break anything. That's why writing clean-up code (which needs to be very carefully written because that actually _can_ break things if not done properly) hasn't been pushed out with super high priority.
    b) Simple (what you call "true") file sync would make this worse, not better, because you would now have those orphaned files on all synced computers.

    3. Duplicates. Duplicates can exist as a byproduct of 2) or they can exist on purpose, e.g. if you have the same file in a group and My Library or if you have actual duplicate in your Zotero database. Zotero itself does have duplicate checking of course (though merging duplicates does not delete attached files out of an abundance of caution -- one of them might, e.g. contain annotaitons). So beyond duplicates resulting from 2.), they aren't sign of a problem. Also, as for 2), "true" file syncing wouldn't help at all with these.
Sign In or Register to comment.