Snapshots on Typophile site are mangled

Why, on any page I create a snapshot from at cut off on the top in a strange way?

For instance, compare this page:

With this snapshot I took:

And this from the wiki section:

And my snapshot:

Creating items demonstrates the same problem.
  • Well, it has something to do with the Flash masthead. It actually works more or less fine for me on OS X—there's a little funkiness, but mostly the Flash just doesn't display and the rest of the page looks fine. I can reproduce what you're experiencing in Windows.

    I've created a ticket, but our snapshot code is from WebPageDump (which itself is based on Scrapbook). You might try saving the page using those two extensions and see if they exhibit the same behavior. I'm not sure how actively maintained WPD is, so we may need to fix the code on our end. We'd appreciate any help anyone had to offer.

    (Unfortunately, saving web pages well is a much more difficult problem than it seems like it should be. For comparison, and to appreciate how much the WebPageDump improves matters, try using Firefox's "Save As" option.)
  • Thanks-- I will investigate further. I appreciate the efforts and the result isn't that bad except that on this particular site the title of the thread/topic is lost, which is annoying but not a show stopper :)

    I'll let you know if I figure anything else out.
  • edited October 4, 2007
    I just checked with WebPageDump and ScrapBook and the pages they save do not exhibit this issue with the same site.

    I did notice that if I click and drag as if highlighting text I can reveal the text that is covered up in the Zotero snapshot/item... so that helps a little bit, but if your code is using code from those plugin(s), a bit of tweaking might be in order :)

    Interestingly, if I capture in one of those FIRST, then snapshot that capture, it also works!
  • So, this comes down to JavaScript saving being disabled in WebPageDump and enabled in Zotero. For some reason, the Flash masthead in the saved page isn't working well with JS enabled.

    WPD may have a good reason for disabling JS by default, but I enabled it when I integrated the code mainly because I was testing the NYTimes site and the Flash movie section on the front page is lost with JS disabled. Of course, the movies there are streamed from the server when played, so saving them is a little silly, but if you're archiving the front page of the Times, you probably at least want to see the video stills and descriptions when you view the archived page instead of the large black box you get with JS saving disabled.

    And you can always turn off JS temporarily. On a snapshot of the Typophile site, that fixes the problem for me on OS X.

    So, I'll probably keep JS saving enabled unless people find more egregious problems with snapshots. There may be something WPD is doing (or not doing) during the save that results in the problem with the Flash header on that page, but I'm not going to delve into the code at the moment to try to fix it.

    (Also note that, at least on my Intel Mac, viewing a NYTimes front page snapshot crashes Firefox unless I exclude the Zotero storage directory in the Flash security settings. Hopefully this will be fixed in Firefox 3 and/or a later version of the Flash plugin.)
  • Hi,

    yes we had a good reason for disabling and removing javascript by default inside webpagedump. This is due to the fact that we are saving the DOM tree and the DOM tree is already reflecting possible dynamic javascript changes.

    Consider a javascript function which simply adds a text to the web page dynamically. This text is then saved by webpagedump because it is inside the dom tree. Reopening the saved webpage afterwards would result in doubling the text if the javascript code is not removed (old text from DOM tree + new dynamic text). For more information about the problems you could refer to the SOFSEM paper on the webpagedump homepage.

    What would be needed is a kind of decision between "good" and "evil" ;-) javascript code but I think this is almost impossible.

    Currently I have not much time for working on webpagedump but I wont forget it...
  • Thanks for the explanation, Bernhard—makes sense.
Sign In or Register to comment.