Zotero saved html files consume a lot of disk space

I just compared a file saved (a) the Zotero way (b) the standard way, saving to Windows Explorer as Web Page, html only.

Via Zotero the saved file had 668 sub-files and consumed 10.5 MB of disk space. Via Windows Explorer it had 1 file, consuming 181 KB.

So the Zotero save consumed around 60 times more space. Over time that's going to build up into a lot of storage usage.

Is there a way of reducing this space consumption?
  • there isn't, unfortunately. Html only isn't a fair comparison, since that's actually going to leave you without a lot of the webpage - including images and the like, which may be crucial, but Zotero snapshots are much larger and cumbersome than other complete formats such as mht. Unfortunately, there is no single format that's universally supported (at least that we're aware of), so I don't think there's currently a way out of that, but if someone knows of a good alternative to webDump (which is what Zotero is using) I'm sure devs would be quite interested.
  • Right, to clarify, one of our requirements is that the files be openable in browsers alone without additional extensions, which is why we don't create some compressed single-file format. The code we're using also errs on the side of completeness, so it may produce bigger snapshots than saving as a complete web page from a browser (which would be the relevant comparison, not, as adamsmith says, HTML-only), but hopefully also produces more accurate snapshots. (On the other hand, the code we're using hasn't been updated in years.)

    But web pages and all their ancillary files these days are also just pretty big, particularly with advertising junk. You don't get all the files on every load because they get cached, but Zotero can't reuse files across snapshots because of the requirement above.

    If you turn off automatic snapshots and add them to items manually, you may be able to use a Readability-style bookmarklet or extension to clean up the page before saving (or you can save the print-friendly page). Whether such tools work depends on how they function.

    I could see us incorporating a tool like that, either as an option while saving or as a way of cleaning up existing snapshots, but it would take some work.
  • For what it is worth, on my Firefox using the 1-page html-only save produces a save that (a) opens faster (b) is more like the original, colored bits and ads and images and all (c) is easier and quicker to move the cursor around (the Zotero method produces a curiously juddery movement, as if my 8GB RAM Core i5 was really straining).
  • http://opinion.inquirer.net/78931/the-marcoses-never-really-left-home
  • HTML-only saves the page with links to images. If those images are removed/relocated or if you're offline, you'd be out of luck.

    If that page loads faster at all, it is likely ONLY because your browser is using cache. The rates were similar for me.

    Please comment further on point (b). I don't know what is different from using the snapshot method.

    Re. (c), when do you experience this? On save? While the page is loading? After it has already loaded?
  • Apologies noksagt, I got distracted.

    (a) Thanks for this explanation, this was probably what the others were trying to tell me

    (b) When I saved the link I sent by the "save as html file only" system it loaded EXACTLY like the original. If you save it by snapshot I think you will find that when opened, it does not look 100% like the original.

    (c) The judderiness occurs when the snapshot is already loaded.

    Thank you for your responses so far. For me the issue is somewhat academic as I've decided on html saves to my local disk for research storage. It takes less space, seems easier to work with, seems less hazardous in terms of potential disconnection between records and files, and using add-on software I can comment and tag my Windows Explorer file/folder display exactly as I want. Using the bibliographic function of Zotero will have to wait till a later stage!
Sign In or Register to comment.