Suggestion: Store all files for each webpage snapshot as one archive file (e.g. .tar)

Problem:
Every time Zotero takes a snapshot from a webpage, hundreds of files (images, js, css, etc) are added to the Zotero data directory. This makes taking backups from the Zotero data directory, moving it around, or syncing it with cloud services like Google or Dropbox more difficult.

Suggested Solution:
Add the option to store web snapshots as one archive file. When later Zotero is opening a snapshot, the archive file can be extracted in a temp directory just before opening.

Any archiving format would work. It doesn't need to (and maybe for the sake of performance shouldn't) do any compression, so tar, tar+gzip, zip, etc all should be fine. Perhaps the option with the best library support is the best option.

Assessment of difficulty:
An archiving library is required so that Zotero can archive/extract the snapshot files. As long as such library is available (a quick search shows that there are some open source options such as jszip, zip.js, js-untar, tar-js), the rest should not be very difficult. Although I am not really familiar with Zotero code so I may be missing some obvious things. Please let me know if that is the case.

I would be happy to hear any feedback or advice.
  • edited July 19, 2017
    Same for me. I suggest saving as mhtml. This format is available for saving from and opening in Internet Explorer, and with an add-on also from Chrome and Firefox.
  • These are brilliant suggestions, and would go a long way towards the alleviating the problems with backing up a zotero database that contains many web pages.
  • edited July 20, 2017
    This also would be fix the issue with "Rename attachments" function of ZotFile plugin: now ZotFile moves (renames) only the main html file of web-page shnapshot, but all other accompanying files remains in Zotero Data Directory. As result, there are incomplete text-only shnapshot in ZotFile Custom Location Directory, and rubbish in Zotero Data Directory.

    wh_fzj_ste, unfortunately, as wikipedia says,
    "Mozilla Firefox requires an extension to be installed to read and write MHT files. Two extensions are freely available, Mozilla Archive Format and UnMHT, but both will be discontinued in the future. UnMHT's author has no plans to support Electrolysis. MAF does not yet support Electrolysis."
  • There is a new Firefox add-on for saving complete web pages which is compatible with the new Firefox architecture: "Save Page WE". It saves the page in html format which can be opened with Firefox as well as Internet Explorer / Edge (I did not test other browsers).
Sign In or Register to comment.