snapshot network traffic?
When I view an html snapshot, it creates network traffic, but when I view a pdf, it does not. I don't use sync or have any settings that obviously explain this. Could you tell me what the network traffic is doing, please, and how I can configure it?
I'm not aware of a generic way to cut network traffic for all file:// URLs.
You can use "offline mode" if your browser supports it or may set a temporary proxy, but either may impact other use of your browser and/or need to be switched manually.
https://stackoverflow.com/questions/6291916/disabling-loading-specific-javascript-files-with-firefox
Which brings in a related point; some webpages don't work without javascript. The typekit scripts, for instance, blank all the page text if you block them. This is obviously a problem for the web designer, not Zotero. However, it means that when I do not have access to a network connection, even digging into the html source may not retrieve the content I have saved to Zotero. I'm guessing most web designers prefer to work in areas with cheap, reliable internet connections.
Thanks for your help. These are somebody else's problems.
So when I set NoScript to run no scripts at all, opening a saved page still loaded the scripts from assorted domains.
Firefox used to have a functionality where you could tell it not to load scripts requested by certain domains, ever.(http://kb.mozillazine.org/Security_Policies) This would make browsing with a scriptblocker much less antisocial, costly, and slow when using a limited-bandwidth connection. Apparently this functionality is gone.(https://support.mozilla.org/en-US/questions/1000843)
Working offline stops the scripts from downloading, but it also stops html and everything else from downloading.
Your saving mechanisms sound interesting. Some web designers do pay attention to Zotero function, so just giving a warning when saving a page with problematic scripts might give some an incentive to make websites render script-free. A reader-mode-like functionality would be nice, but I think that the problematic websites are the ones that would not work in reader mode.
I have resorted to a manual version of option 2, copy-pasting text. I would be very happy with a version of option 2 that I could manually preview and OK. I would also appreciate a functionality that let me tell Zotero to strip all of the javascript out of all the saved pages (ideally with the ability to make individual exceptions, but useful interactive content is quite rare).
Saving video media is often awkward because someone has decided to use HTML5, but split the video into dozens of second-long clips, so that saving a video does not work. On a slow connection, this can mean that the video freezes repeatedly, often enough to make speech incomprehensible. Audio sometimes has similar problems.
My primary interest is in having an accurate, readable copy of the page content that I can quickly access.
My biggest problem with the current full snapshot is that it can take forever to load (and may not load at all).
I understand that this is because of JavaScript execution and that there may be ways of working around it, but those seem both complicated, browser dependent and not necessarily guaranteed to work. What happens if the page has been moved or was originally (legally) viewed behind a paywall?
Reducing the number (and total size) of the files saved in the library is also a plus (as noted by several users).
Of the two approaches Dan outlined in this thread, the "as rendered" version is appealing because it guarantees that everything I read on the page is present in the snapshot.
On the flip side, an "as rendered" version does capture a lot of irrelevant material (e.g. "You may also be interested in...."). Does an easier-to-implement, "simplified" version in fact capture all of the "relevant" page content? Or is "relevant" page content in the eye of the beholder?
https://w3c.github.io/scholarly-html/
Some websites, such as PMC, already have scrapeable fulltexts. Zotero may have to take a site-by-site approach on scrapers for fulltexts, too.
Plan S also has machine-readability requirements. I hope the Zotero team has discussed or will discuss them with the requirement-writers (whom one can write to at coalition-sscienceeurope.org, with a circle-at sign between the two successive esses).
With reference to this discussion:
https://forums.zotero.org/discussion/36151/wikified-copyleft-bibliographic-database
There is also a Wikimedia fulltext database, human-edited into a machine-readable form. It can only cover public-domain and open-licensed materials (CC-BY-SA etc). There are currently lots of books and few articles in it, but this should change.
TiddlyWiki, which has BibTeX integration, is sometimes handy for storing fulltexts, as are assorted static website tools.
https://forums.zotero.org/discussion/comment/363634