Comparing Xpath Objects

guitarzxt · August 10, 2015

How do I compare two Xpath objects? For instance, I want the scraper to scrape different things on different pages and I cannot differentiate by URL, only by the title given on the website header. So I want to compare the thing Xpath is pointing at with a string or another Xpath.

At wikisource everything is at en.wikisource.org/wiki/TheSpecificEntry The only thing to differentiate between the different resources (which I want to cite with different elements) is by the header or something of that nature. So I want to make a coupe scrapers inside conditional statements. However, simply putting
if (function.Xpath('thispath').text() == function.Xpath('thatpath').text()) {}
would not execute even though if I were to put either side of the conditional statement into the scraper as encyclopediaTitle, for instance, it would give me the exact same thing for both.

What must I do to compare the content of the two Xpaths?

aurimas · August 10, 2015

Are you talking about the translator framework? It would be helpful if you posted the entire code that you are working with in gist.github.com Right now, I'm a bit confused as to what you're doing. (E.g. function.Xpath should not even be possible syntactically, since function is a reserved keyword, not an object.)

adamsmith · August 10, 2015

yeah, this is likely beyond what you can reasonably do in the translator framework, which is limited to simple scraping.

If you do need that level of complexity, you'll need to rewrite this as a regular translator, where you can then simply do
if (ZU.xpathText(doc, xpath1) == ZU.xpathText(doc, xpath2)){}

guitarzxt · August 10, 2015

*sigh* Okay, are there any instructions for getting me started on that?

adamsmith · August 10, 2015

https://www.zotero.org/support/dev/translators/coding
has most of what you need.
Some of the German newspaper translators -- SpiegelOnline, Sueddeutsche, Welt -- may be good templates for a simple scrape implemented as a regular, non-framework translator.

guitarzxt · August 10, 2015

Okay, so can't I just use the xpathText to save a variable and then use a scraper below? Is that what you were suggesting before?

All I want to do is the check what the header says. If is says one thing, then scrape this stuff. If it says another thing, scrape that stuff. If neither, then scrape this stuff.

My latest try at this is below. Like I said in a different post, I'm new to programming translators and new to Zotero in general and it has been about 10 years since I've done any programming, but I am trying this because it will ultimately save me a ton of time on my project. However I am on the verge of giving up and just entering all the information into Zotero manually.

Here is the link to the code. Any suggestions?

https://gist.github.com/anonymous/c4eaec3af8864a46f0b0

adamsmith · August 11, 2015

no, mixing FW translators and regular translators really doesn't work well. The Framework relies on its own syntax which makes a lot of things a lot easier, but if you want more, it's really not worth it.

What I don't understand, though, is why you need to _compare_ two xpaths. Don't you just want to establish whether an Xpath contains a certain string? That you can do with a single xpath.

guitarzxt · August 11, 2015

Yes, I do want to establish whether an Xpath contains a certain string, and then scrape certain elements and give set values based on whether or not the string matches the header in the Xpath. How do I do that? I have looked on a number of sites over the last few days and have not gotten it to work.

guitarzxt · August 11, 2015

I just have to put the Xpath with the Contains part in the conditional?

guitarzxt · August 11, 2015

I have tried

if (FW.Xpath('//html/body/div[3]/div[3]/div[4]/div[1]/div/div/div[2]/b/span/a[contains(text(),"Dictionary of National Biography")]')) {
FW.Scraper({
itemType : 'encyclopediaArticle',
encyclopediaTitle : "Dictionary of National Biography",
detect : FW.Xpath('//h1[@id="firstHeading"]'),
title : FW.Xpath('//h1[@id="firstHeading"]').text().trim().remove(/\n.+/g),

and so on, but that doesn't work. Is there a way to return a boolean or something? If the specified path does not contain the specific string, what does Xpath return?

aurimas · August 11, 2015

FW methods do not return values that you are expecting, They all return FW objects. Also, the order of events with FW is not straightforward. It mostly just queues up the actions when you call them and executes them later (when called from doWeb/detectWeb). In short, as adamsmith says, FW is only good when you need simple logic and you can follow the constructs that it provides. For everything else, you have to drop the framework and go with plain JavaScript.

adamsmith · August 11, 2015

what aurimas says, but you can take advantage of the fact that the detect statement in FW essentially works like an if so:

FW.Scraper({
itemType : 'encyclopediaArticle',
encyclopediaTitle : "Dictionary of National Biography",
detect : FW.Xpath('//html/body/div[3]/div[3]/div[4]/div[1]/div/div/div[2]/b/span/a[contains(text(),"Dictionary of National Biography")]'),
title : FW.Xpath('//h1[@id="firstHeading"]').text().trim().remove(/\n.+/g),

should give you what you want. But you'll need a full FW.scraper for every page type you're looking at.

guitarzxt · August 11, 2015

So if the detect doesn't find the string contained in the Xpath it will disregard everything in that scraper?

adamsmith · August 11, 2015

correct. and move on to try the next one.