Zotero connector not saving NYTimes home pages - error report id 949848809
For years I have been saving the "Today's Paper" page as a website with snapshot to both Zotero online and Zotero Standalone. After the Nov. 18 update, the Save to Zotero with webpage snapshot no longer works on the current versions of Firefox and Chrome.
A sample webpage:
"http://www.nytimes.com/indexes/2017/11/17/todayspaper/index.html"
When I save this directly to my online account, the item is saved without a snapshot, which makes it useless for my research purposes. This happens no matter what kind of "save as" option I choose.
When I save this to Zotero Standalone from a dedicated browser profile, the snapshot is created but the layout and much of the formatting of the page is destroyed - it no longer looks anything like the NYtimes page and much more like a crude HTML document.
This is especially vexing because if I go to a specific article in the Times, I can save the page perfectly both to my online account and to Zotero Standalone.
What can I do to get the translator to properly save the "....todayspaper/index.html" pages as attachments as was done with Zotero 4 and the old addon?
A sample webpage:
"http://www.nytimes.com/indexes/2017/11/17/todayspaper/index.html"
When I save this directly to my online account, the item is saved without a snapshot, which makes it useless for my research purposes. This happens no matter what kind of "save as" option I choose.
When I save this to Zotero Standalone from a dedicated browser profile, the snapshot is created but the layout and much of the formatting of the page is destroyed - it no longer looks anything like the NYtimes page and much more like a crude HTML document.
This is especially vexing because if I go to a specific article in the Times, I can save the page perfectly both to my online account and to Zotero Standalone.
What can I do to get the translator to properly save the "....todayspaper/index.html" pages as attachments as was done with Zotero 4 and the old addon?
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
The current recommended method here is to have Zotero open when you try to save.
Zotero 4.0 used a very old, unmaintained third-party tool to save snapshots, and while that worked for this, it had to be replaced. Zotero 5.0 uses Firefox's built-in page-saving feature (since Zotero is still based on Firefox), and unfortunately that does appear to miss styles for this particular page — you can see that yourself if you save as "Web Page, Complete" in Firefox. So the first step would be for this to be fixed in Firefox.
Saving to your online account should work the same for either page, but snapshots do appear to be broken on pages without translators — we'll look into that. It does work on pages with translators, as you say. The catch, though, is that the snapshots we're saving directly to the online library are only the HTML pages themselves, with all assets (images, scripts, etc.) still coming from the original site. That makes it appear to work when you view it online, but if you sync that snapshot and then go offline, you'll find that it doesn't actually work. That's inconsistent with the snapshots we (usually) save directly to Zotero, and I think we'll need to unify that one way or another.
What I suspect we'll do in general is just start saving simplified snapshots by default, and those will work the same whether you save to the online library or Zotero and will just never include the page style. For more complex preservation, we could plausibly offer a way to generate a PDF or PNG from a webpage, or we might just say that you should use one of many available tools to do that and then add that file to Zotero.
We could also look for a well-maintained, open-source browser extension that does a good job saving HTML snapshots, but that may be a tall order. The modern web is really too complex to try to make accurate HTML copies of pages — much of the time when you think you're preserving a page, some of it doesn't actually work offline or if some of the original media goes away, so it just creates a false sense of security, and if you include JavaScript the local snapshots can be broken in all sorts of other ways as well. (On the other hand, browsers are still trying to do it.)
Anyhow, for your case in the meantime, I'd recommend that you save the page some other way — e.g., in Chrome, which seems to get this page right — and then view the created snapshot, save it to Zotero, and then move the attachment to the existing NYT item and delete the other webpage item. (The attachment URL will be wrong, but it should otherwise work.)
Even things like Cute PDF writer, which typically produce searchable pdf's, fail at this task.
Bummer!
@dstillman
Tested your hypothesis on FFox 57.0 and Chrome.
Re Firefox - Using the File -> Save option only saves page as an html document which has lost nice formatting and working links but does save the page's text. This looks much like the type of save now produced by a save to Zotero Standalone.
Re Chrome - Printing to PDF automatically produces a searchable PDF.
Sadly, all this takes many more keystrokes than was needed to get a perfect save in Zotero 4 before the update.
To get back to basics, the single greatest virtue of Zotero vs other such software was its speed and utter simplicity. One click and everything was saved perfectly either to the online storage or to a Standalone file.
Now the elegant speed and simplicity is gone perhaps forever, judging from the software complications you outlined in your initial responses to this discussion. Have my fingers crossed that your engineers will be able to find a way to restore Zotero's original elegant functioning.
Anyhow, thanks to you and all others who replied.