document.querySelector fails in detectWeb

edited March 14, 2023
I'm trying to redo the Haaretz translator with the following:

function detectWeb(doc, url) {
if (doc.querySelector('meta[name="title"]') !== null) {
return 'newspaperArticle';
}
}

In the browser the query works well, but in Scaffold it raises a "TypeError: document.querySelector(...) is null".

Possibly related: https://forums.zotero.org/discussion/comment/407582#Comment_407582

Any ideas?
  • I don't think that that error could be caused by the code block that you posted. Could you upload the full translator code as a GitHub Gist (or, for that matter, open a zotero/translators PR, which I hope you're planning to do eventually!) and post the link here?
  • I'm definitely planning to do that, once I get it to work.

    You're right. This time the error is at the beginning of doWeb:

    item.title = doc.querySelector('meta[name="title"]').content;

    Again, the query works in the browser, for example with this page: https://www.haaretz.co.il/tmr/wallstreet/2023-03-14/ty-article/.premium/00000186-e057-d8aa-a996-f7ff690a0000

    The complete code is in this gist:
    https://gist.github.com/morags/d49230c7350083f670de4e51a9a0f228

  • The error is on line 36. "property" is misspelled as "propery".

    In general, it's best to use the attr/text utility functions, which will return an empty string instead of throwing an error if the selector doesn't match. That line could be replaced with:

    item.date = attr(doc, 'meta[property="publishDate"]', 'content');
  • Two things:
    1) You should be using text(doc, selector) which is more robust (e.g., evaluating to null instead of triggering an error when the selector isn't on a page) if all you want is the content of a node. It's also more in line with other Zotero translators and thus easier to maintain.

    2) Are you able to load haaretz pages in Scaffold? Because I'm not -- and without an active page, this is of course the exact error you'd be getting since, as it says, doc.querySelector('meta[name="title"]') is null



  • (Agreed, but in this case we want the content attribute of a <meta> tag, not the content of a node, so we need attr().)
  • edited March 14, 2023
    True. I was assuming all of the selectors resolve once the page loads, or a silent failure if they don't. Any idea if/when Zotero will transition to ECMAScript 2020 (optional chaining and the nullish coalescing operator)?

    I'm not sure. I've tried three different translators (Haaretz, WaPo and NYT) with arbitrary articles from the respective websites (under the "Browser" tab), and I only get errors.
  • Zotero supports whatever the underlying Firefox engine supports. That's currently quite a bit back (Firefox LTS somewhere in the 60s, I think?), will go up to somewhere in the 100s for Zotero 7.

    I'm not sure I understand the second paragraph and what that refers to. You should actually see the page load in the browser tab as you would in a regular browser. If you don't, something is wrong.
  • 1. In Browser -> URL, paste this (https://www.haaretz.co.il/tmr/wallstreet/2023-03-14/ty-article/.premium/00000186-e057-d8aa-a996-f7ff690a0000) and click "Create Web Test".
    2. Console return "Error: No title specified for item". No page loads, and the same code (see updated gist) returns the title in the browser.
  • As I said, the browser behaves like a regular browser. If you don't see the page loaded, the page is, in fact, not loaded (with the expected errors resulting -- a page that isn't loaded doesn't have a title).

    I do see that problem specifically for Haaretz (both .com and .co.il ) in Scaffold but NYTimes (e.g., https://www.nytimes.com/2018/01/11/opinion/social-media-dumber-steven-pinker.html) and WaPo (e.g., https://www.washingtonpost.com/us-policy/2023/03/14/72-hour-scramble-save-united-states-banking-crisis/ ), work just fine - note that, just like in any current browser, you have to press return after pasting in a URL.

    @AbeJellinek -- any thoughts on what's going on with Haaretz in the Scaffold browser?

  • edited March 21, 2023
    Okay! That was not self-evident...

    ATM I'm getting a "no title" error for web tests, and "detection failed" for the built-in tests. Any idea why?
  • edited March 21, 2023
    any thoughts on what's going on with Haaretz in the Scaffold browser?
    @adamsmith: Not sure what's going on there, but it works in a Zotero 7 build, so not going to worry about it for Zotero 6.

    @Morags: I don't know what you're asking at this point. As adamsmith says, Haaretz isn't going to work in Scaffold, since the page won't load.
  • If you want to work on the Haaretz translator in the meantime, you can still edit in Scaffold, update translators in the Zotero Connector preferences, and test in the browser, or you can test with translation-server.
  • I'll do that. Thanks!
Sign In or Register to comment.