Feature request: Optionally save web pages as PDF (and a list of tools that might help)
There has been some discussion on this before, most notably:
The latest comment on limitations was in 2012 from @adamsmith, who mentioned that this might be more suitable for a plugin and that there might not exist any cross-platform library to use. I wonder if it would be possible receive a comment on how this feature is viewed by the Zotero team today. Is it something that would be interesting if the right tools would be available and there was enough resources to implement them? Or is it seen as a less useful feature and it will likely not be implemented?
My reasons for considering this a highly useful feature, is that while there are other solutions for annotating live webpages (such as hypothes.is), I think PDF is an ideal format for webpage snapshots. It allows for using the same workflow and tools as when annotating research article PDFs, and would work well with the extract annotation feature from Zotfile. It is like printing out a physical copy of a page and taking notes, great for learning and to refer to later (of course keeping in mind when it was retrieved, but many research posts such as blogs rarely updated, rather new posts are made).
Given my interest in this feature, I want to try to help by sharing a few tools, which could be able to provide this feature in Zotero. I am not able to say which (if any) would be suitable choices for incorporating this feature into Zotero. All these tools are cross-platform, open source, and currently maintained.
jsPDF is a client-side JS library for pdf generation. Source code is 5 MB, not sure about size of dependencies.
wkhtmltopdf has precompiled binaries that are around ~80 MB in size (for all three OSes together). If that would be to big to bundle, maybe Zotero users could be instructed to download the binary themselves if they want to print to PDF, and then there could be an option in Zotero to set the location of wkhtmlbinary to use.
Google chrome/Chromium has a headless mode that allows for printing (and can. While not everyone
chrome –headless –disable-gpu –print-to-pdf file:///path/to/myfile.html
PDFShift is an online API. This might include the least resources from the Zotero team. Users would have to be comfortable with that their HTML files are uploaded to a third party for conversion to PDF, but since these are web snapshots they are already public anyways (and could maybe be uploaded without being linked to user id?).
As for the user facing side, there would ideally be an option to save webpage snapshots as HTML, PDF, or both. Saving as only PDF would involve clicking the browser extension button, which would trigger a download of the HTML page, a conversion to PDF, and finally a deletion of the HTML file.
A couple of node-based solution that I don’t know if they are relevant: