Comparing Xpath Objects
How do I compare two Xpath objects? For instance, I want the scraper to scrape different things on different pages and I cannot differentiate by URL, only by the title given on the website header. So I want to compare the thing Xpath is pointing at with a string or another Xpath.
At wikisource everything is at en.wikisource.org/wiki/TheSpecificEntry The only thing to differentiate between the different resources (which I want to cite with different elements) is by the header or something of that nature. So I want to make a coupe scrapers inside conditional statements. However, simply putting
if (function.Xpath('thispath').text() == function.Xpath('thatpath').text()) {}
would not execute even though if I were to put either side of the conditional statement into the scraper as encyclopediaTitle, for instance, it would give me the exact same thing for both.
What must I do to compare the content of the two Xpaths?
At wikisource everything is at en.wikisource.org/wiki/TheSpecificEntry The only thing to differentiate between the different resources (which I want to cite with different elements) is by the header or something of that nature. So I want to make a coupe scrapers inside conditional statements. However, simply putting
if (function.Xpath('thispath').text() == function.Xpath('thatpath').text()) {}
would not execute even though if I were to put either side of the conditional statement into the scraper as encyclopediaTitle, for instance, it would give me the exact same thing for both.
What must I do to compare the content of the two Xpaths?
If you do need that level of complexity, you'll need to rewrite this as a regular translator, where you can then simply do
if (ZU.xpathText(doc, xpath1) == ZU.xpathText(doc, xpath2)){}
has most of what you need.
Some of the German newspaper translators -- SpiegelOnline, Sueddeutsche, Welt -- may be good templates for a simple scrape implemented as a regular, non-framework translator.
All I want to do is the check what the header says. If is says one thing, then scrape this stuff. If it says another thing, scrape that stuff. If neither, then scrape this stuff.
My latest try at this is below. Like I said in a different post, I'm new to programming translators and new to Zotero in general and it has been about 10 years since I've done any programming, but I am trying this because it will ultimately save me a ton of time on my project. However I am on the verge of giving up and just entering all the information into Zotero manually.
Here is the link to the code. Any suggestions?
https://gist.github.com/anonymous/c4eaec3af8864a46f0b0
What I don't understand, though, is why you need to _compare_ two xpaths. Don't you just want to establish whether an Xpath contains a certain string? That you can do with a single xpath.
if (FW.Xpath('//html/body/div[3]/div[3]/div[4]/div[1]/div/div/div[2]/b/span/a[contains(text(),"Dictionary of National Biography")]')) {
FW.Scraper({
itemType : 'encyclopediaArticle',
encyclopediaTitle : "Dictionary of National Biography",
detect : FW.Xpath('//h1[@id="firstHeading"]'),
title : FW.Xpath('//h1[@id="firstHeading"]').text().trim().remove(/\n.+/g),
and so on, but that doesn't work. Is there a way to return a boolean or something? If the specified path does not contain the specific string, what does Xpath return?
FW.Scraper({
itemType : 'encyclopediaArticle',
encyclopediaTitle : "Dictionary of National Biography",
detect : FW.Xpath('//html/body/div[3]/div[3]/div[4]/div[1]/div/div/div[2]/b/span/a[contains(text(),"Dictionary of National Biography")]'),
title : FW.Xpath('//h1[@id="firstHeading"]').text().trim().remove(/\n.+/g),
should give you what you want. But you'll need a full FW.scraper for every page type you're looking at.