Available for beta testing: Improved webpage snapshots
The latest Zotero beta and Zotero Connector beta can now take greatly improved webpage snapshots, even on websites that make heavy use of dynamic rendering, deferred images, or advertising. Pages should generally be saved exactly as you see them in your browser, after any JavaScript has run and reflecting any modifications made to the page with other browser extensions. All JavaScript is stripped, so any interactive functionality won't be preserved, but you also won't end up with broken redirects, doubled ad images, or other problems that are common when saving dynamic webpages. (Zotero's previous snapshots were based on Firefox's "Save Page As…" functionality, which, like Chrome's, exhibits many of those problems.) Once this has rolled out, we'll be able to turn snapshots back on on sites where they were previously disabled (e.g., NYT, Twitter), and they'll work again on sites where they had previously stopped working (e.g., Medium).
The new functionality is based on the great SingleFile browser extension (technically, SingleFileZ). We're not currently saving single files — either with encoded embedded resources (SingleFile) or as self-extracting ZIP files (SingleFileZ) — but rather using the SingleFile logic to extract just the cleaned HTML, CSS, image, and font files necessary to display the page as shown. All single-file options involve trade-offs, but for a future version we're considering switching to combining snapshot resources into non-self-extracting ZIP files that would normally be viewed within Zotero but that would still be extractable as standard ZIP files for data portability.
Since the new snapshot functionality does more processing of the page, and particularly since it tries to load deferred images (e.g., images that only appear when you scroll down the page), it can be a little bit slower on complex pages or pages with very large deferred images. When saving from the Zotero Connector, it also runs in the browser itself instead of in Zotero, meaning that if you leave the page before the snapshot finishes it won't be saved to Zotero. We've talked before about reducing the number of pages on which Zotero saves snapshots, and I think we'll want to revisit that to make sure snapshots are only being saved where they provide real value.
If you want to try out the new snapshots when saving from your browser, you'll need both the Zotero beta and the Zotero Connector beta for Firefox. (The Zotero beta alone will use the new snapshot functionality when saving via Add Item by Identifier — e.g., for an arXiv ID.)
Let us know if you run into any problems.
The new functionality is based on the great SingleFile browser extension (technically, SingleFileZ). We're not currently saving single files — either with encoded embedded resources (SingleFile) or as self-extracting ZIP files (SingleFileZ) — but rather using the SingleFile logic to extract just the cleaned HTML, CSS, image, and font files necessary to display the page as shown. All single-file options involve trade-offs, but for a future version we're considering switching to combining snapshot resources into non-self-extracting ZIP files that would normally be viewed within Zotero but that would still be extractable as standard ZIP files for data portability.
Since the new snapshot functionality does more processing of the page, and particularly since it tries to load deferred images (e.g., images that only appear when you scroll down the page), it can be a little bit slower on complex pages or pages with very large deferred images. When saving from the Zotero Connector, it also runs in the browser itself instead of in Zotero, meaning that if you leave the page before the snapshot finishes it won't be saved to Zotero. We've talked before about reducing the number of pages on which Zotero saves snapshots, and I think we'll want to revisit that to make sure snapshots are only being saved where they provide real value.
If you want to try out the new snapshots when saving from your browser, you'll need both the Zotero beta and the Zotero Connector beta for Firefox. (The Zotero beta alone will use the new snapshot functionality when saving via Add Item by Identifier — e.g., for an arXiv ID.)
Let us know if you run into any problems.
This discussion has been closed.
As I say:
Yes, the drawback to Base64 encoding is the increased size, which will generally be ~33% larger.
But it's true that Base64 encoding would be better for sharing — in terms of ease if not file size — than the regular ZIP of individual files that I mention as a possibility above. We'll take this into consideration when planning additional options.
We're planning to automatically convert existing multi-file snapshots — both old and new — into single files in a future client version.
We have a few more bug fixes for the SingleFile snapshots coming up, and we still may try to convert existing snapshots to single files in a future version.
Closing this thread. Let us know in new threads if you find any other issues with the new snapshots.