Translator coding: how to remove html elements from the saved snapshot?

hawisher · September 12, 2017

Hi all. I am coding a translator for personal use. The website it scrapes has a header and some other html elements that get messed up by the snapshot process. They become very irritating when attempting to view the snapshot. What I'm hoping to be able to do is remove elements that I select using xpath, but my attempts to do so have not been successful. I've attempted to use the removeChild function on the doc object, but it has given me an error message to the effect that

"parentNode.removeChild(...)" is not a function.

Is it possible to do this in Zotero translators? If so, should I be using the removeChild function? I've been alerted that distributing my translator might get me sued, so I'm not looking for troubleshooting help. I just want to know if my goal is possible.

adamsmith · September 12, 2017

no, sorry. You can't customized the snapshots in the translators at all as far as I'm aware. If there's a problem with the snapshot code, that might(!) be fixable globally.

hawisher · September 12, 2017

Thanks for the response!

hawisher · September 12, 2017

@adamsmith I think I may have been unclear earlier. I'm not looking to edit the snapshots after they've been saved. What I'm looking to do is take the document object (the web page) that is passed to the doWeb function and edit it, then attach that edited object instead of the original document. I'm not trying to mess with the snapshot code. Does that change anything?

@fbennett because I'm using juris-m, and it may work differently

adamsmith · September 12, 2017

Ah yes, good point; I'm not actually sure -- I know we're not doing this anywhere, but doc (as passed to doWeb) should be the whole page and you should be able to modify it (and then save it as an attachment). Could you post the whole code snipet you're trying to use for this part -- here if it's short, to gist.github.com if it's longer

hawisher · September 18, 2017

So I've fixed the problem I was initially having, but now I've got a different one.* I have a function I called html_cleaner. It takes a new document object, removes several irritating elements, and returns a new document object. I know for a fact that this function works, because when I use the translator, those elements disappear from the page in my browser window. The problem is, those elements remain in the snapshot. The relevant code I have is this:

doc = html_cleaner(doc)
NewItem.attachments.push({
document: doc,
title: 'Page'
})

This gives me a snapshot of the page, but that snapshot still has the annoying elements even though html_cleaner(doc) definitely removes those elements.

I have also tried creating a new variable (doc2) and attaching that, but the problem persists. Any advice?

* For anyone else who stumbles across this thread, the problem I was having was that doc.getElementbyID was not defined, so I couldn't get an element to use the removeChild method. Instead I had to use the Zotero.Utilities.Xpath to get the element I needed.