Zotero connector not saving NYTimes home pages - error report id 949848809

PersonaWork · November 20, 2017

For years I have been saving the "Today's Paper" page as a website with snapshot to both Zotero online and Zotero Standalone. After the Nov. 18 update, the Save to Zotero with webpage snapshot no longer works on the current versions of Firefox and Chrome.

A sample webpage:
"http://www.nytimes.com/indexes/2017/11/17/todayspaper/index.html"

When I save this directly to my online account, the item is saved without a snapshot, which makes it useless for my research purposes. This happens no matter what kind of "save as" option I choose.

When I save this to Zotero Standalone from a dedicated browser profile, the snapshot is created but the layout and much of the formatting of the page is destroyed - it no longer looks anything like the NYtimes page and much more like a crude HTML document.

This is especially vexing because if I go to a specific article in the Times, I can save the page perfectly both to my online account and to Zotero Standalone.

What can I do to get the translator to properly save the "....todayspaper/index.html" pages as attachments as was done with Zotero 4 and the old addon?

dstillman · November 20, 2017

So this one's a bit complicated.

The current recommended method here is to have Zotero open when you try to save.

Zotero 4.0 used a very old, unmaintained third-party tool to save snapshots, and while that worked for this, it had to be replaced. Zotero 5.0 uses Firefox's built-in page-saving feature (since Zotero is still based on Firefox), and unfortunately that does appear to miss styles for this particular page — you can see that yourself if you save as "Web Page, Complete" in Firefox. So the first step would be for this to be fixed in Firefox.

Saving to your online account should work the same for either page, but snapshots do appear to be broken on pages without translators — we'll look into that. It does work on pages with translators, as you say. The catch, though, is that the snapshots we're saving directly to the online library are only the HTML pages themselves, with all assets (images, scripts, etc.) still coming from the original site. That makes it appear to work when you view it online, but if you sync that snapshot and then go offline, you'll find that it doesn't actually work. That's inconsistent with the snapshots we (usually) save directly to Zotero, and I think we'll need to unify that one way or another.

What I suspect we'll do in general is just start saving simplified snapshots by default, and those will work the same whether you save to the online library or Zotero and will just never include the page style. For more complex preservation, we could plausibly offer a way to generate a PDF or PNG from a webpage, or we might just say that you should use one of many available tools to do that and then add that file to Zotero.

We could also look for a well-maintained, open-source browser extension that does a good job saving HTML snapshots, but that may be a tall order. The modern web is really too complex to try to make accurate HTML copies of pages — much of the time when you think you're preserving a page, some of it doesn't actually work offline or if some of the original media goes away, so it just creates a false sense of security, and if you include JavaScript the local snapshots can be broken in all sorts of other ways as well. (On the other hand, browsers are still trying to do it.)

Anyhow, for your case in the meantime, I'd recommend that you save the page some other way — e.g., in Chrome, which seems to get this page right — and then view the created snapshot, save it to Zotero, and then move the attachment to the existing NYT item and delete the other webpage item. (The attachment URL will be wrong, but it should otherwise work.)

dstillman · November 20, 2017

Oh, actually, you can't currently save local webpages in the Zotero Connector. So you might need to use some single-file format (e.g., print to PDF) that you just add to Zotero as a file.

PersonaWork · November 20, 2017

I greatly appreciate your quick response to this problem and will try some of your suggestions. However, any solution that relies on PDF's in connection with Zotero or Firefox is not very satisfactory. I've tried saving pdf's from Firefox and other browsers and from various add-ons like Print Edit and its replacement. Invariably, the PDF's are not searchable, even though they produce a nicely formatted image of the pages.

Even things like Cute PDF writer, which typically produce searchable pdf's, fail at this task.

Bummer!

dstillman · November 20, 2017

I can't speak to Windows solutions, but the built-in printing-to-PDF on macOS certainly produces searchable PDFs. There's no good reason that shouldn't be the case.

LiborA · November 20, 2017

On MS Windows, the built-in printing-to-PDF certainly produces searchable PDFs too.

dstillman · November 20, 2017

Yeah, it would really be quite strange for any PDF generated from a document to not be searchable.

bwiernik · November 20, 2017

There is an option in Adobe and some other PDF printers to save an image of the document as a PDF, rather than the pages themselves. It's possible you have such an option checked? Otherwise, have you tried to use a different PDF viewer (e.g., if you open the PDF in Firefox, is the text selectable/searchable)?

PersonaWork · November 20, 2017

Actually, several different kinds of PDF creators for Windows will create searchable and non-searchable PDFs. This is especially true of scanning software (e.g., HP software for scanners) - in such cases, one must set an option to make the PDFs searchable.

@dstillman
Tested your hypothesis on FFox 57.0 and Chrome.

Re Firefox - Using the File -> Save option only saves page as an html document which has lost nice formatting and working links but does save the page's text. This looks much like the type of save now produced by a save to Zotero Standalone.

Re Chrome - Printing to PDF automatically produces a searchable PDF.

Sadly, all this takes many more keystrokes than was needed to get a perfect save in Zotero 4 before the update.

To get back to basics, the single greatest virtue of Zotero vs other such software was its speed and utter simplicity. One click and everything was saved perfectly either to the online storage or to a Standalone file.

Now the elegant speed and simplicity is gone perhaps forever, judging from the software complications you outlined in your initial responses to this discussion. Have my fingers crossed that your engineers will be able to find a way to restore Zotero's original elegant functioning.

Anyhow, thanks to you and all others who replied.