Good way to save Comments from website articles?

FluidMindOrg · January 17, 2013

Anyone have any good advise on how to save all the comments from an article on a news website or blog?

There are several problems with doing this:

(1) Since many of the sites have moved to loading the comments with XMLHttpRequest calls, when you add the page itself to your Zotero library, NONE of the comments are saved with it.

(2) I have never found a site that has a link that allows you to view all the comments on one page, which would at least allow you to attach that page separately.

So what options are we left with? The only way I've found is to copy and paste the comments into an attached Note to my Zotero entry for that article. It is, however, very time consuming if you want to get all of the comments, especially if there are hundreds of them and the damn website will only allow you to view 25 or 50 at a time.

Anyone else have other techniques you've used to save comments?

Cheers
--Dan

ajlyon · January 18, 2013

If there are particular sites you are targeting (or platforms), we may be able to write site translators that crawl and attach the comments. But as a general problem it's a pretty gnarly one.

aurimas · January 18, 2013

Saving a snapshot with Zotero for Firefox should capture the page as you see it. Currently saving snapshots from Chrome or Safari forces Zotero to reload the page in which case it might not capture comments loaded after page is done loading. For completeness sake, could you provide a link to a sample page that is not being captured with such comments?

FluidMindOrg · January 18, 2013

Thanks Aurimas. While the snapshot does save the HTML as it was received by Firefox (I'm using Zotero in Firefox, not the standalone), many of these news sites and blog apps are starting to load the comments separately using the Javascript XMLHtmlRequest method after the initial HTML has been sent. So when Zotero saves the page, it only gets the original HTML that was sent, not the next HTTP GET request that was sent via Javascript to fill the div where the comments are placed.

Here are some examples:

The NPR.org site uses the DISQUS comments service, which loads with Javascript after the article page is loaded:
http://www.npr.org/blogs/thetwo-way/2013/01/18/169646736/as-social-issues-drive-young-from-church-leaders-try-to-keep-them
Go to that page and save it into Zotero, then view the snapshot it took. You'll see the page, but the area where comments are loaded is blank. I'm noticing more and more news sites using DISQUS. Here's another example, though with this one you can't tell it's using DISQUS until you go to login:
http://www.kentucky.com/2012/11/30/2426599/asbury-professor-writes-of-democrats.html

Many other news sites are starting to use the Facebook comments API, which works in a similar way. They may load the first 50 or so comments onto the initial HTML that's sent. But if there are more and you keep clicking the "View previous comments" link to eventually show all of them on the page, then save the page to Zotero, it doesn't save all the comments in the snapshot, only the ones that initially showed on the page. So in that case, the only way to save all of them is to show them all, the copy and paste them into a note. But then you run into the issue that if there are a lot (say a thousand), Zotero won't allow you to save that much into a note.

The NYTimes.com site, however, looks like it works the same way, yet Zotero seems to be able to save the comments that are shown after clicking the "READ MORE COMMENTS" link at the bottom of the comments. Here's an example:
http://takingnote.blogs.nytimes.com/2013/01/17/no-comment-necessary-conspiracy-nation/?ref=opinion

--Dan

aurimas · January 18, 2013

You're right. We'll see what we can do about that.

FluidMindOrg · January 18, 2013

Thanks. I'll keep trying to figure out a good way to save just selected sections of web pages. The only thing I've found so far that actually does a decent job of it is Evernote Clipper. Unfortunately, though, it only saves it to your Evernote account and doesn't allow you to simply save it to your local drive.

aurimas · January 18, 2013

It's not very streamlined, but you can save pages as they appear using Mozilla Archive Format, with MHT and Faithful Save extension. The nice thing is that it will save the entire page (as you see it, I tested it on the npr site) in a single file. The not-so-nice thing is that you need to have Mozilla Archive Format extension installed to be able to view that file. If you're using Firefox, that's not a problem though.

I'm looking into how we can adapt the code from that extension for Zotero Snapshots actually.

Credit goes to Rintze for discovering this btw.

FluidMindOrg · January 18, 2013

Good to know. I tried doing that with the UnMHT extension and it didn't work so well.

Now if the Mozilla Archive Format extension could save only selected parts of a page (like Evernote Web Clipper) it would be perfect :-)

Thanks

antikorpo · February 10, 2013

+1 from over here ;)

http://forums.zotero.org/discussion/27739?page=1#Item_1

On the Wikipedia page about MAFF[1] i found a chrome extension "which uses the data URI scheme to package everything in a single .html file"[2]

[1]: http://en.wikipedia.org/wiki/Mozilla_Archive_Format
[2]: https://chrome.google.com/webstore/detail/mpiodijhokgodhhofbcjdecpffjipkle

FluidMindOrg · November 13, 2013

Just wanted to resurrect this issue, as more and more sites are using either Facebook or Disqus for their commenting, and Zotero still doesn't save any of the comments in its snapshots. (Often, it's the comments that contain the qualitative data I need :-)

aurimas, did you ever make any progress in figuring out how Evernote's Clipper captures pages, or perhaps how the Mozilla Archive Format captures the entire page, comments and all?

Thanks
--Dan

aurimas · November 13, 2013

No, no progress there.

viceversa · May 11, 2025

Any updates about this issue?

tim820 · May 11, 2025

Regarding techniques for capturing long forum or news threads with multiple pages of comments - that are not all initially visible - as a single snapshot, here's some approaches:
https://forums.zotero.org/discussion/120138/methods-for-saving-snapshots-of-multi-page-online-web-forum-threads