document.querySelector fails in detectWeb
I'm trying to redo the Haaretz translator with the following:
function detectWeb(doc, url) {
if (doc.querySelector('meta[name="title"]') !== null) {
return 'newspaperArticle';
}
}
In the browser the query works well, but in Scaffold it raises a "TypeError: document.querySelector(...) is null".
Possibly related: https://forums.zotero.org/discussion/comment/407582#Comment_407582
Any ideas?
function detectWeb(doc, url) {
if (doc.querySelector('meta[name="title"]') !== null) {
return 'newspaperArticle';
}
}
In the browser the query works well, but in Scaffold it raises a "TypeError: document.querySelector(...) is null".
Possibly related: https://forums.zotero.org/discussion/comment/407582#Comment_407582
Any ideas?
You're right. This time the error is at the beginning of doWeb:
item.title = doc.querySelector('meta[name="title"]').content;
Again, the query works in the browser, for example with this page: https://www.haaretz.co.il/tmr/wallstreet/2023-03-14/ty-article/.premium/00000186-e057-d8aa-a996-f7ff690a0000
The complete code is in this gist:
https://gist.github.com/morags/d49230c7350083f670de4e51a9a0f228
In general, it's best to use the attr/text utility functions, which will return an empty string instead of throwing an error if the selector doesn't match. That line could be replaced with:
item.date = attr(doc, 'meta[property="publishDate"]', 'content');
1) You should be using
text(doc, selector)
which is more robust (e.g., evaluating to null instead of triggering an error when the selector isn't on a page) if all you want is the content of a node. It's also more in line with other Zotero translators and thus easier to maintain.2) Are you able to load haaretz pages in Scaffold? Because I'm not -- and without an active page, this is of course the exact error you'd be getting since, as it says, doc.querySelector('meta[name="title"]') is null
content
attribute of a<meta>
tag, not the content of a node, so we needattr()
.)I'm not sure. I've tried three different translators (Haaretz, WaPo and NYT) with arbitrary articles from the respective websites (under the "Browser" tab), and I only get errors.
I'm not sure I understand the second paragraph and what that refers to. You should actually see the page load in the browser tab as you would in a regular browser. If you don't, something is wrong.
2. Console return "Error: No title specified for item". No page loads, and the same code (see updated gist) returns the title in the browser.
I do see that problem specifically for Haaretz (both .com and .co.il ) in Scaffold but NYTimes (e.g., https://www.nytimes.com/2018/01/11/opinion/social-media-dumber-steven-pinker.html) and WaPo (e.g., https://www.washingtonpost.com/us-policy/2023/03/14/72-hour-scramble-save-united-states-banking-crisis/ ), work just fine - note that, just like in any current browser, you have to press return after pasting in a URL.
@AbeJellinek -- any thoughts on what's going on with Haaretz in the Scaffold browser?
ATM I'm getting a "no title" error for web tests, and "detection failed" for the built-in tests. Any idea why?
@Morags: I don't know what you're asking at this point. As adamsmith says, Haaretz isn't going to work in Scaffold, since the page won't load.