document.querySelector fails in detectWeb

Morags · March 14, 2023

I'm trying to redo the Haaretz translator with the following:

function detectWeb(doc, url) {
if (doc.querySelector('meta[name="title"]') !== null) {
return 'newspaperArticle';
}
}

In the browser the query works well, but in Scaffold it raises a "TypeError: document.querySelector(...) is null".

Possibly related: https://forums.zotero.org/discussion/comment/407582#Comment_407582

Any ideas?

AbeJellinek · March 14, 2023

I don't think that that error could be caused by the code block that you posted. Could you upload the full translator code as a GitHub Gist (or, for that matter, open a zotero/translators PR, which I hope you're planning to do eventually!) and post the link here?

Morags · March 14, 2023

I'm definitely planning to do that, once I get it to work.

You're right. This time the error is at the beginning of doWeb:

item.title = doc.querySelector('meta[name="title"]').content;

Again, the query works in the browser, for example with this page: https://www.haaretz.co.il/tmr/wallstreet/2023-03-14/ty-article/.premium/00000186-e057-d8aa-a996-f7ff690a0000

The complete code is in this gist:
https://gist.github.com/morags/d49230c7350083f670de4e51a9a0f228

AbeJellinek · March 14, 2023

The error is on line 36. "property" is misspelled as "propery".

In general, it's best to use the attr/text utility functions, which will return an empty string instead of throwing an error if the selector doesn't match. That line could be replaced with:

item.date = attr(doc, 'meta[property="publishDate"]', 'content');

adamsmith · March 14, 2023

Two things:
1) You should be using text(doc, selector) which is more robust (e.g., evaluating to null instead of triggering an error when the selector isn't on a page) if all you want is the content of a node. It's also more in line with other Zotero translators and thus easier to maintain.

2) Are you able to load haaretz pages in Scaffold? Because I'm not -- and without an active page, this is of course the exact error you'd be getting since, as it says, doc.querySelector('meta[name="title"]') is null

AbeJellinek · March 14, 2023

(Agreed, but in this case we want the content attribute of a <meta> tag, not the content of a node, so we need attr().)

Morags · March 14, 2023

True. I was assuming all of the selectors resolve once the page loads, or a silent failure if they don't. Any idea if/when Zotero will transition to ECMAScript 2020 (optional chaining and the nullish coalescing operator)?

I'm not sure. I've tried three different translators (Haaretz, WaPo and NYT) with arbitrary articles from the respective websites (under the "Browser" tab), and I only get errors.

adamsmith · March 14, 2023

Zotero supports whatever the underlying Firefox engine supports. That's currently quite a bit back (Firefox LTS somewhere in the 60s, I think?), will go up to somewhere in the 100s for Zotero 7.

I'm not sure I understand the second paragraph and what that refers to. You should actually see the page load in the browser tab as you would in a regular browser. If you don't, something is wrong.

Morags · March 15, 2023

1. In Browser -> URL, paste this (https://www.haaretz.co.il/tmr/wallstreet/2023-03-14/ty-article/.premium/00000186-e057-d8aa-a996-f7ff690a0000) and click "Create Web Test".
2. Console return "Error: No title specified for item". No page loads, and the same code (see updated gist) returns the title in the browser.

adamsmith · March 15, 2023

As I said, the browser behaves like a regular browser. If you don't see the page loaded, the page is, in fact, not loaded (with the expected errors resulting -- a page that isn't loaded doesn't have a title).

I do see that problem specifically for Haaretz (both .com and .co.il ) in Scaffold but NYTimes (e.g., https://www.nytimes.com/2018/01/11/opinion/social-media-dumber-steven-pinker.html) and WaPo (e.g., https://www.washingtonpost.com/us-policy/2023/03/14/72-hour-scramble-save-united-states-banking-crisis/ ), work just fine - note that, just like in any current browser, you have to press return after pasting in a URL.

@AbeJellinek -- any thoughts on what's going on with Haaretz in the Scaffold browser?

Morags · March 21, 2023

Okay! That was not self-evident...

ATM I'm getting a "no title" error for web tests, and "detection failed" for the built-in tests. Any idea why?

dstillman · March 21, 2023

any thoughts on what's going on with Haaretz in the Scaffold browser?

@adamsmith: Not sure what's going on there, but it works in a Zotero 7 build, so not going to worry about it for Zotero 6.

@Morags: I don't know what you're asking at this point. As adamsmith says, Haaretz isn't going to work in Scaffold, since the page won't load.

dstillman · March 21, 2023

If you want to work on the Haaretz translator in the meantime, you can still edit in Scaffold, update translators in the Zotero Connector preferences, and test in the browser, or you can test with translation-server.

Morags · March 21, 2023

I'll do that. Thanks!