Feature request: Optionally save web pages as PDF (and a list of tools that might help)

  • @bwiernik I do disagree, for all the reasons outlined above. Adding an HTML viewer and annotation tool to Zotero won't solve all the issues/situations mentioned.

    Although it would still be welcomed, for those that need to preserve the full page layout and/or prefer to save the whole webpage instead of just the article text to pdf, and that right now don't have many alternatives for annotations
  • I found a pretty easy partial work around for this that works 99% of the time.

    I use the chrome extension linked here. https://www.printfriendly.com/

    I was leery at first but now it is one the tools on my “starting lineup” for writing. I use in conjunction with the zotero chrome extension. When I’m on a webpage I want to save, I just click the zotero extension first to get the parent item saved along with the snapshot. Then I immediately click the print friendly pdf extension which I put directly next to the zotero one. It immediately pops up a screen where I can very quickly and easily adjust the pdf. Then you just click the download PDF button and boom it instantly downloads in your browser. Then you just drag that file over to Zotero, attaching it to the websites parent item. Then I can use that file just as you said, for searching and tagging purposes.

    I only make adjustment to remove large pictures or unnecessary text on the page. It already turns it into a simplified version ready for printing, but I like having the cleanest copy possible without giant pictures. Sometimes it automatically removes them, sometimes not. EVEN if I have to delete some things, I would guess the whole process of going from just a website to a citation in zotero with snapshot and readable PDF cant be more than 5 seconds total. SUPER EASY.

    I know it’s not automatic, but it’s at least worked for me so I can search and tag more easily. Once I figured this out, I also went back and saved a pdf for all my snapshots. All it takes is clicking on snapshot, clicking on print friendly pdf extension, click download, and drag over to citation in zotero, just skipping the step of saving the website to zotero. “Back tracking” like that is a little annoying and time consuming compared to an automatic feature, but easily something that can be done while watching TV, etc. Or if you’re like me, you train your 9 year old nerdy nephew (or whoever) to use zotero and delegate out.
  • edited August 17, 2021
    Another small contribution from the same annotate the web preoccupation.
    Zotero conector with beta version 5.+something, chrome-edge.

    From the zotero connector you could select any text on a webpage, then r-click, choose create zotero item and note from selection. The notes are nice, in attachment with the item from your standalone interface, but the remaining issues are to my view the following:
    - you have to repeat the process for every annotation, which will result in as many duplicate as annotation in your library, which in return could be tweaked by merging duplicates later on. Remain clicks to delete the duplicates snaphots. Also, you have as many child note as highlighted text.
    - There is no corresponding coloured highlight, neither in the snapshot file later on. You cannot see any yellow, neither make a click on the note that would teleport you at the note location from your usual library interface (a thing you can do with the super nice new annotation features in beta, for pdf anotated file).
    - It is quite a repetitive process to rclick-save as child item on everything you want to highlight

    Posible paths:
    - enable zotero conector to only save child item when the parent item is already saved, still needs couple of clicks tough
    - somehow enable multi-selection, the one you use in text editor using ctrl+click, then add everything at once. (Idtk it is possible even with carret mode)
    - make the corresponding selection highlighted for the html snaphot later-on
    - use memex or hipotesis to have the coloured highlight in the snapshotfile, combined with the right click combo from zotero connector mentioned above. This implies to process twice what you want to highlight. Not nonsense corresponding workflow would be to make two readings, the first doing most of your highlighting, the second for adding every note on zotero. Positive side effect is to restrain you from annotating like crazy, keeping only a level-2 strongness quotation as item...
    - Still difficult to make other edits tough: you have to re-save snapshotfile if you make new highlight on the snapshot html file, and for new annotations you want to keep as child note you have to go back to the website hoping the page has not change to do your new annotations.
    - and all other already mentioned solutions, including using pdfs...
  • i prefer epub, so being able to chose the format would be great. I use rdrview https://github.com/eafer/rdrview which in turn is based on Firefox https://github.com/mozilla/readability, which zotero could implement, or an extension be made
  • edited 20 days ago
    Zotero could automatically access and download a pdf version of Wikipedia articles through the "download as pdf" link on the lower left of every article. In the meantime this can be done manually by first downloading the PDF and then adding it to the relevant Wikipedia citation/snapshot entry in Zotero using "Add Attachment" > "Attach Stored Copy of File". The pdf can then be easily viewed and annotated in Zotero using the new PDF preview feature available in the current Zotero beta.

    Formal Wiki pdfs are better presented than the results of using the built-in browser print to pdf feature, at least in Firefox.
  • edited 16 days ago
    Enabling Reader View in Firefox, selecting the page icon to the right of the address, displays a minimalistic webpage that works well for printing to pdf. Firefox shows the reader view icon only on some websites, for some reason? Toggling reader.parse-on-load.force-enabled” to “true” using about:config displays the reader view even o those websites.
  • edited 12 days ago
    I recently ran into the same hiccup - wanted to save some pages as PDF (mainly because the current beta version of Zotero allows for in-app PDF reading).

    After trying a few extensions, I settled for "PDF Mage" - because it actually opens the converted page as a PDF in the browser itself (without downloading), and Zotero even recognizes it as PDF and so I can save it directly without having to clutter the downloads folder and manually drag-drop/use Zutilo.

    In Microsoft Edge, the "Download PDFs" option should be disabled, else the file is auto-downloaded instead of loading in the new tab.

    I have tested it in Vivaldi too.

    For the following link, Zotero could even extract metadata from the saved PDF!

    And what if the site is overloaded with ads and fluff and you are simply interested in the text?

    Well, for that I tried the "Read Pro" chrome extension (in Vivaldi), so the workflow is:
    1. Visit the webpage
    2. Click the Read-Pro icon to sanitize it
    3. Click the PDf-Mage icon to load it as PDF in adjacent tab
    4. Save to Zotero

    Read Pro also has an option to download the sanitized page as a PDF (but that again is the longer route).

    Both these extensions are free to use and do not require you to create any accounts.

    1. PDF Mage
    Edge: https://microsoftedge.microsoft.com/addons/detail/pdf-mage/jncoibmpdjfaccecklaooocaenaaibni
    Vivalid: https://chrome.google.com/webstore/detail/pdf-mage/gknphemhpcknkhegndlihchfonpdcben

    2. Reader Pro
    Vivaldi: https://chrome.google.com/webstore/detail/read-pro/ckjogkiieodbdmkeabpnhdaagilainco
    Edge: Unavailable
  • edited 12 days ago
    Other Options for extracting text:

    You can try other extensions for sanitization or the Firefox “Reader View“ mentioned above, but you will have to test if your tool of choice works well with PDF Mage i.e. loads a webpage as an actual PDF (URL ends with .pdf) so the Zotero connector recognizes it as one.

    When in a hurry, the bookmarklet I posted in this thread might help. It extracts and auto-copies text to the clipboard, saving time and clicks. Also preserves the line-breaks and indentations (https://forums.zotero.org/discussion/85569/indexing-web-pages-without-snapshots).

    Or you can use the bookmarklet from Textize: https://www.textise.net/Bookmarklet.aspx

    Disclaimer: I am not affiliated to any of these services/extensions in any way. I simply chanced upon them during my hunt.
  • edited 5 days ago
    Not posing as a solution but I do keep pdfs as snapshots of webpage for accessibility (other devices) and economic reasons (takes up less space).Printerfriendly is good but it's so slow (and sometimes it fetches inadequately from a webpage). Printing from browser is more likely to get what you now see onto PDF.

    I do support a built-in function w/ Zotero to save snapshots as PDF.

    My current solution is to:
    1) NOT use the snapshot function; just saving an entry into Zotero (via Connector)
    2) Print-off the webpage as a PDF and then attach it into Zotero.

    Caveats for speeding-up:
    a) I'm on a Mac, with keyboard shortcut I can speed up the process by setting up an App Shortcut (Export as PDF - or whatever it's called in your browser, I'm on Safari) to save. My way: cmd+shift+P, then enter (choose the download directory).
    b) I usually use this when I'm reading off RSS, so I do in batch without having to jump back-and-forth between different Zotero collections and download directory (a). So I'll just save loads of entries into Zotero & page-print PDF into a directory.
    c) Return to Zotero. Attach New File (and Zutilo for hotkey access) for every new entry: because you'd be working in order (new-old) so you can just go through the list attaching the files from your download directory.

Sign In or Register to comment.