save snapshots as single html, mhtml archive or pdf

The current implementation is highly inefficient in my opinion. It saves the snapshot as a bunch of different files, keeping the original javascript intact. This means that the files are hard to manage, take up a lot of space, and - most importantly - take forever to open because they make a bunch of javascript calls which may go unanswered. The current implementation is so cumbersome that I never use the snapshot feature anymore.

There are formats which are designed for web archiving. I currently use chrome extensions like SingleFile (single purged html), Print Friendly & PDF (pdf print), and Print Edit (mhtml or html). There are also extensions that allow you to edit the page before saving it with those extensions. Ideally, Zotero would allow you to edit the page however you like in your browser (with other extensions) and then save the result as PDF or Single HTML w/o JS or MHTML. It may even leverage existing extensions - like the ones i mentioned - to accomplish that.

Please consider improving the snapshot feature. A pdf printer like Print Friendly would actually be ideal due to the fact that PDF's play well with Bibtex and annotation.
  • edited October 30, 2017
    Saving snapshots as single HTML files is planned: https://github.com/zotero/zotero-connectors/issues/194
  • edited October 30, 2017
    Saving simplified single-file snapshots by default is planned. Saving full single-file snapshots with data URIs (like SingleFile) is possible, but that would still produce very large, slow-loading snapshots, and I'm not sure it's necessary. (Most things 1) don't go behind paywalls or disappear, 2) would be fine as text, and/or 3) are available from the Internet Archive, and with JS the average webpage is now vastly bigger than when Zotero was created.)
  • Single File purges javascript, which means the file loads pretty fast (in my experience)
  • Just for others, I've been using the Firefox add-on SavePage WE for this purpose:
    https://addons.mozilla.org/en-US/firefox/addon/save-page-we/

    I guess with an extension and Zotfile it's a 2-click solution vs. 1-click if Zotero implements it internally.
  • I just wanted to re-up this suggestion and give another reason it would be valuable: the multiple tiny files that make up a typical snapshot take forever to index, copy, backup, and delete (not to mention that it consumes inodes). The sheer number of files is what makes my large Zotero store so difficult to manage, much more than the GBs of data.
  • I would also be keen for Zotero to have an improved web page snapshot feature. In some cases a simplified snapshot would be sufficient but usually I want to be able to save a page as completely as possible. Something based on SingleFile to achieve the latter could work well (if it's technically suitable). As mlinchits points out, SingleFile purges Javascript and save pages do load quickly. It compresses the page's resources so the resulting single html file is reasonably sized (much smaller than that of SavePage WE in my testing). The page's text is not compressed so it would still be indexable by Zotero.
  • edited March 17, 2020
    Any progress on this matter? Highly interested in an implementation of SingleFile in Zotero. Zotero's stock web snapshot function often results in incomplete files with screwed css and missing images. The use of grabbers like SingleFile would be a big step forward in terms of usability and data integrity, also very easy to share. It is 2020!
  • Chrome has already supported saving web page as a single mhtml file, which has all the images. Perhaps this function can be simply accomplished using Chrome and then automatically moving the saved file to Zotero folder?

    Besides, currently Zotero does not recognize mhtml file as web page.
Sign In or Register to comment.