tar-ball-esque snapshot file storage?
I was recently messing around with remote storage options for my library and found that an enormous amount of time was spent copying all the tiny images/scripts/css files that come with each snapshot (seems like the average was around 150-200 files per snapshot.
Is there a way to mash all these files together into one single file for storage purposes so that only 1 transfer need be made for an entry, like the unix tar kind of thing? No need for compression... just aggregation. I would gladly accept slightly slower unpacking/viewing if it meant i could synchronize faster.
I don't want to have to do away with my snapshots, but I guess I will if I have to.
-Keith
Is there a way to mash all these files together into one single file for storage purposes so that only 1 transfer need be made for an entry, like the unix tar kind of thing? No need for compression... just aggregation. I would gladly accept slightly slower unpacking/viewing if it meant i could synchronize faster.
I don't want to have to do away with my snapshots, but I guess I will if I have to.
-Keith
What exactly do you mean by "remote storage options"?
For the current file sync implementation in the Zotero 1.5 Sync Preview, we actually use one compressed ZIP file per directory to speed things up, but we'll likely be adding an option to upload individual files, since there would be some big advantages to being able to pull files directly off the server without needing to uncompress them first (e.g., from mobile devices).
If you're talking about something other than file sync in 1.5, though, it doesn't really have anything to do with Zotero—it's the job of whatever other tool you're using to upload things efficiently, and there are, for example, clever online backup tools that use hashing to upload only a single instance of files that exist in multiple locations on disk.
And even if you're not using a clever tool, there's a rather simple solution, suggested by Scot on another thread: set something up to periodically delete all the unnecessary files in the storage directory, since 75% or so of the files in snapshots from advertising-supported sites are probably unnecessary and undesired.
Storing files on the local disk efficiently, as discussed in that other thread, is another, more complex matter, but it's not really related to the question of online storage. But as for storing all Zotero attachments in a single, huge, corruptible file, well, you don't really want that.
At least, in a test using Transmit on OS X with 107 storage folders totaling 18MB (not counting filesystem block overhead), transferring the folders took 6:30. Transferring an uncompressed ZIP of all the files took 5:15. That's a moderate difference, but not huge, and it's between uploading 2989 files and uploading a single file, which isn't going to happen.
This part is a discussion for the other thread, but Zotero could, of course, use a more efficient storage mechanism, not duplicating redundant files—but then the files would be accessible only through Zotero, wouldn't be indexed by system search tools, etc.
Might not help that I'm also going through some software to mount the webdav as a drive, but it would still improve the transfer of anything with snapshots to do it all together.
My current solution is just to run zotero off a local copy of my library and just do some manual, external syncing with the webdav so at least it doesn't have to do this every time i retrieve/store a file. Also I removed all the snapshots as even with this solution it is just too time-consuming to transfer.
Just to reiterate, the option I was referring to wouldn't have to involve any compression, which would take time to compress/decompress. Just lumping the files together with tar would pretty much solve the transmission overhead problem without any need for EXTRA compression overhead. The types of files that are taking up most of the space in the attachments (images, pdfs with a lot of images) aren't going to compress anyway.
-Keith