When saving a webpage, first make a pdf of it and then save the pdf instead

The idea is to save a single pdf of the webpage instead of saving all figures, scripts and additional files.

I noticed that even a saved Wikipedia article requires ~2.5 MB and ~150 files on disk. After I converted to a pdf, the pdf size was ~500 kB in just one file. I then proceeded add the pdf to Zotero, instead of adding the whole page.

Of course, the formatting and scripts were all gone on the pdf, but the important text remained. This feature might not work for other websites or other users, but it is an idea anyway.
  • edited November 30, 2014
    Last I checked there was no way to save a page as PDF for extensions in Firefox or Chrome. There was also no good JavaScript library available for the task. Things may have changed, so I'll take another look, but basically this is a technical limitation and I think we're mostly on board with saving pages as PDFs if possible. One concern that I did have was pages with iframes, since I don't think there's a way to handle that in PDFs (at least not the ones that scroll).
  • edited December 1, 2014
    I think we're mostly on board with saving pages as PDFs if possible
    I wouldn't say that. Web pages should be saved as HTML. The size and the redundant ancillary files are unfortunate, but there aren't great solutions for addressing those while keeping the files accessible without Zotero or additional software.

    Saving with JS enabled — and all the pseudo-files that can result from that — probably increases the size, though. We can always revisit that.

    I'm not sure if we have translators save print-friendly pages, but we should perhaps do that when possible.
  • I'm not sure if we have translators save print-friendly pages, but we should perhaps do that when possible.
    I don't think we do. There are generally some issues with this though that we need to resolve. Print friendly versions are usually extra friendly and pop up a print dialog when viewed, which persists in the snapshot. In another case (I believe Web of Knowledge) we aren't able to save static HTML pages that they serve because it triggers a download dialog (the HTML files are served with "Content-Disposition: attachment" header)
  • Hmm, well, I guess we could probably set up a special hidden browser that stripped Content-Disposition and ignored the print command...
  • edited December 10, 2014
    I second the need for some mechanism to save web pages with less individual files (at least as visible to humans and backup software). Some webpage images/components have crazy long names that can mess up automated backup of the drive.

    How about giving the user an option to save as MHTML or MAFF?
  • +1. Sadly, I just basically want the words in front of me. I print the page to PDF and save it somewhere (knowing that failure is a real possibility!), but it would be nice if Zotero could do this.
Sign In or Register to comment.