Delete all snapshots in one go

Hi, I found that backing up my Zotero lib costs a long time. I guess the snapshots contain a large number of files which are seldom used by me. Just wonder how i can delete all these snapshots? I need to keep the PDFs.
  • edited May 9, 2016
    If you can match them with a search (e.g., for "snapshot" in "All Fields & Tags" mode), you can press Ctrl-A/Cmd-A to select all the matches (the black, as opposed to gray, matches) and then press Delete/Backspace to delete them. You can also match them with an advanced search (e.g., "Attachment File Type" "is" "Web Page"), save it as a saved search, and do the same from there.
  • Dan, that helps a lot using search function. thanks!!
  • Snapshots often download tens or even hundreds of the little files that go into modern websites - javascript files, svg files, gifs by the million - as well as the html and css files that form the 'core' of the webpage. If you want to keep the html and css without all the rest (which will usually leave you with a crippled web page that pretty much just has the text and not much else - but that's usually all you need!) you can navigate to Zotero's storage folder, search through ALL the folders for the unnecessary file types (.js, .svg, .gif, .jpg etc) and delete them all in one go. That streamlines your library without *completely* losing the snapshots if you ever need the text for later.
  • Hi bob.builder77,
    Thanks for the suggestions, it is indeed a good way if want to view the text later, especially for those without pdf full-text and during off-line.

    I have deleted all the snapshots using the search function suggested by Dan. The size of my zotero file folder now occupied ~3.5 GB, but previously it was 10+ GB. It also helps to save a lot of time during backing up the lib.
  • edited May 23, 2016
    > If you can match them with a search (e.g., for "snapshot" in "All Fields & Tags" mode),

    NOTE: This only works if you have NOT renamed the snapshots. I found that I did rename one of my snapshots.
  • Anyone know how to efficiently delete all snapshots for items that have PDFs? That is, I would like to keep snapshots if I don't have a PDF.
  • Nothing super simple, but you can create a search for
    attachment is PDF
    attachment is Webpage
    and include parents & children, create a saved search. Tag all the top-level items in the saved search with an otherwise unused tag (most easily a colored one so you can batch assign by number key), then filter by that tag and search for Snapshot as described above.
  • Dan's advance search also captures single-file HTMLs. What if I only wanted to capture Zotero-produced snapshots in my search, including those I have renamed such that the word "snaptshot" is not in their title?

    At some deep level, Zotero knows the difference between these files because they have different icons associated with them. (Since I'm trying to convert all old snapshots to single-file HTMLs, it would be nice if there were some search method that could tap into that deeper metadata.)
  • Well, in the meantime, I'll offer a hack.

    [NB: This is for anyone interested in converting multi-file webpages (or otherwise isolating only snapshots for deletion even if you've changed their names and so on).
    It will only be meaningful to you if, like me, you have a substantial mix of newer content-rich single-file HTMLs (that you want to keep) along with older snapshots or multi-file webpages (that you want to convert or discard). The main reason for attempting this at all is freeing up disk space and cutting down on clutter. If none of this applies, these suggestions are not for you. If you have better suggestions, please offer them. And finally: all the usual caveats... backup your data; attempt at your own risk; proceed with caution.]

    1. Using a script or text-editing app of your choice, hunt down a metatag in your single-file HTMLs that's unique to them. It will depend on which browser add-on you used to make them; you'll have to open a few and have a look. Find-and-replace all instances of that tag in all .html files such that you insert a hidden div next alongside it in every file. This div should having some identifying text inside of it, e.g., "single_file_HTML." This will trick Zotero into indexing that text as real content, although in 99 percent of cases it will not affect how the page displays, because a div outside a body will generally be ignored by web browsers. Like I said, this is a hack.

    2. Rebuild your full-text index in Zotero. If you don't index, turn it on.

    3. Create and save an advanced search for all Attachment Content containing your hidden text, in all Attachment Types that are webpages. For simplicity, create a new tag such as #single_file_HTMLs and tag all items in your search results with it. Now you can create an advanced search for all webpages that are not tagged in this way: i.e., your snapshots. Delete them, or do whatever you want with them.

    When you're done, you can undo the find-and-replace routine in step 1. Probably a good idea.

    Using these steps, I was able to find all snapshots and multi-file htmls.

    Converting them in one batch, as expected, was a bit more complicated. But it reduced my storage footprint by about 40 percent, so worth it. One day I may post on that process. But it will become largely irrelevant (I hope) once Zotero goes to single-file HTMLs for snapshots by default, which I believe is planned.
Sign In or Register to comment.