Automaticaly add webpage to archive.org

Zotero is capable of doing a snapshot when you save a webpage. The problem with those snapshots is they can be quite heavy and not 100% reliable. Also, most of the time, the website is still up so doing this snapshot is useless.

Worst, snapshots are useless for the reader.

It would be cool to be sure that any page you save as a reference will be accessible in the following years.

I think the best would be to use https://archive.org to save the page, and keep the archive.org link in the Zotero reference. I'm thinking that it's even better to cite with the original link and the archive.org link (a bit like what Wikipedia does now).

That would also be cool for achive.org as it will provide some "curated" content to archive.

It can definitely be done in a plugin, but I think it would be a great addition to Zotero core.

To save a page on archive.org you simply have to visit https://web.archive.org/save/URL
  • I think in general this is very attractive and should indeed be done (there actual used to be an experimental add-on: https://github.com/hiberlink/zotero_hiberlink , but I agree that there is a good case to be made that this shouldn't just be an add-on), but the Internet Archive cannot _replace_ Zotero snapshots, only complement them for (at least) two reasons:
    1.) The full text searchability of Snapshots from within Zotero. I know this is crucial for a number of users
    2.) The fact that the Internet Archive follows robots.txt instructions to not archive which include substantial parts of the research-relevant internet including e.g. the New York Times and Quora
  • edited 12 days ago
    I do not think we should remove Zotero snapshots either. Both functionalities have their value on their own and could be used side by side.

    I think, the biggest value using archive.org is for citations and long term references.

    If we have a functionality to use archive.org, then we can for example add an option in the Zotero snapshot to only save only the rendered HTML of the page (no image, no JS, no CSS). A bit like when you use the "read mode" on Firefox. Both functionality can be complementary.
Sign In or Register to comment.