DOI support for www.eupvsec-proceedings.com

stefaneidelltoh · September 12, 2012

How can I get support/a translator for www.eupvsec-proceedings.com ?

The DOI 10.4229/25thEUPVSEC2010-2DV.1.19 should give for example the
data on http://www.eupvsec-proceedings.com/proceedings?fulltext=open+source+graphical+user+interface&paper=8720

As assumed here (http://www.zotero.org/support/known_translator_issues)

Crossref doesn't (yet) have any item metadata available for the DOI.

I contacted crossref.org and asked to add it.

I guess this will take some time... Should I try to write my own translater, as desribed here:

http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus

Or does it only make sense for pages that do not contain a DOI?

adamsmith · September 12, 2012

it absolutely makes sense to write translators for sites with DOI - you can add more info (e.g. CrossRef doesn't provide keywords and abstracts), you can attach a link to the article PDF (for people with access) and you can import search results.

Have a look at the documentation and see if this looks like something you'd like to try to do - there are a number of translators for sites with very similar structure, so I could point you to an example.

It looks like you might also just be able to do this with the translator framework:
http://www.zotero.org/support/dev/translators/framework

using mainly //following-sibling:: type xpaths
http://www.velocityreviews.com/forums/t685244-xpath-to-get-the-first-sibling.html

aurimas · September 12, 2012

FYI, the Registration Agency for that DOI is DataCite. ~~Generally, CrossRef handles North America and DataCite is for European publications.~~

Even so, DataCite does not have metadata registered for that particular DOI. Unfortunately it's up to the content provider to ensure that metadata is supplied during registration and it looks like none of the proceedings on that website had this done.

While there is currently no translator for DataCite, we are working on adding support for it.

EDIT: I misspoke. The doi.org page provides better descriptions of what registration agency registers what.

stefaneidelltoh · September 15, 2012

Thank you for your replies. I did not know that there are different "agencies" for DOIs. The answer that I got from CrossRef is
"The DOI you have reported is registered with some other agency. This is not a CrossRef DOI." ... as aurimas said.

So I would like to create my own translator. Do you know an up to date tutorial, that works with Firefox 15.0.1 ?

The following tutorial does not work:
http://www.zotero.org/support/dev/how_to_write_a_zotero_translator_plusplus
(XPather not available for Firefox 15.0.1)

((((Also, if I hit the button "Test Regex" in Scaffold, I get no result at the TestFrame.))))
Edit: Works with scaffold 3:
https://github.com/downloads/zotero/scaffold/scaffold-3.0.xpi

@adamsmith: Thank you for the hint to the translator framework. If you know an example for a conference paper /similar page please tell me.
(The example at http://www.zotero.org/support/dev/translators/framework
does not seem to work with Target = http://www.eurasianet.org/
What is the right target for the eurasianet example ? Does the example work with an existng web page at all?)

adamsmith · September 15, 2012

you can use Firebug https://addons.mozilla.org/en-us/firefox/addon/firebug/ or even Firefoxes built in "Inspect Element" function for Xpaths.
I'm not sure what you mean by "the right target" - for your example above, you can start with a very simple target Regex, like
http://www.eupvsec-proceedings.com

The framework doesn't get you out of finding and describing the Xpaths - it just means you don't have to write any of the javascript code around them.

stefaneidelltoh · September 15, 2012

Ok, the current code of my translator is here:

http://matameko.wikispaces.com/file/view/EU+PVSEC+Proceedings.js

It seems to work in scaffold 3. I have still some questions:

- How can I use the translator in zotero? (I am missing a button "Use custom translator for current web page" or so)

- How can I add several text objects to combine them in one entry?
The following line does not work:
extra: "info1: " + FW.Xpath('blabla_1').text() + "\n" + "info2: " FW.Xpath('blabla_2').text()
(I did not manage to use the commands "append" or "prepend" for this purpose)

- How can I remove the string "Abstract/Summary:" from my abstractNode? The follwing line does not work:
abstractNote : FW.Xpath('/html/body/div/div[8]/table/tbody/tr/td[3]').text().remove("Abstract/Summary:")

adamsmith · September 15, 2012

a couple of things:

1. If you use Scaffold and save the translator it will just start working with Zotero on reloading the page - make sure you change the translator ID.

2. adding together items is a bit tricky in FW - this thread has sample code:
https://groups.google.com/forum/?fromgroups=#!searchin/zotero-dev/concat/zotero-dev/edE60haSYTw/3LS8nN6cWgYJ

3. .remove should work. Alternatively, you should be able to use .replace(/Abstract\/Summary:/, "")

Some general notes
- you should use the "use Framework" button in Scaffold and note paste the Framework code.
- your translator will be much more stable if you find a way to go without the long, numbered xpaths - so instead of

pages: FW.Xpath('/html/body/div/div[8]/table/tbody/tr[8]/td[2]').text(),

something like FW.Xpath('/table/tbody/tr[contains(text(), "Pages:")]/following-sibling).text()

(I haven't tested this and I always make mistakes with following-sibling - but you get the idea?)

stefaneidelltoh · September 16, 2012

1. How to use translator: -------------------------------------------
Reloading the page did the trick. The icon in the firefox adress bar changed from a blank white paper(DOI) to the conferencePaper icon.

2. Adding strings: --------------------------------------------------
The example on the google groop uses metadata. Following line works:
extra : FW.Xpath('concat("author: ", string(//meta[@name="author"]/@content))'),

However, the strings that I want to concat are not available as meta data. Following code works for me:
extra : FW.Xpath('concat("topic: ", string(//html/body/div/div[8]/table/tbody/tr[4]/td[2]), "\n",' +
' "subtopic: ", string(//html/body/div/div[8]/table/tbody/tr[5]/td[2]), "\n",' +
' "price: ", string(//html/body/div/div[8]/table/tbody/tr[11]/td[2])' +
' )'
),

3. Remove substring: ------------------------------------------------
I had to use slash (\) instead of quotation marks (" or ') for the string, as shown in your example. The \/ removes the / in Abstract/Summary. I use a . to remove the space. Finally it is:

.remove(/Abstract\/Summary:./)

( instead of .remove("Abstract/Summary: ") )

4. No extra code for translator framework: --------------------------
Works great. The "Uses translator framework" checkbox is located at the Metadata tab in Scaffold 3

5. More stable Xpaths: ----------------------------------------------
I did not manage to improve the XPaths so far. Here are two snippets from the source code of the EU PVSEC example page:
=====================
<tr>
<td>DOI:</td>
<td>10.4229/25thEUPVSEC2010-2DV.1.19</td>
</tr>
...
======================
<tr>
<td> Title:</td>
<td> Open Source Graphical User Interface
...
=======================

Following lines did not work for me:

... tr[contains(text(), "DOI:")]/following-sibling).text()
... td[contains(text(), "DOI:")]/following-sibling).text()
... td[contains(text(), "DOI:")]/following-sibling).text()

I guess the problem is, that there are or tags. What should be returned by the text() command? Maybe it is an empty string.

6. How to access a single item of a string list?: -------------------
I would like to extract the place "Valencia, Spain" from following string:
"25th European Photovoltaic Solar Energy Conference and Exhibition / 5th World Conference on Photovoltaic Energy Conversion, 6-10 September 2010, Valencia, Spain"

(I might use a regular expression to remove the preceeding text. Hoewever, splitting and using the last item (last two items) would be more comfortable.)

I am able to split the string with the split command (like for the tags). How can I acces a single item (or several items) of the resulting array?

Following lines dont work:

place: FW.Xpath('/html/body/div/div[8]/table/tbody/tr[6]/td[2]').text().split(',',3)
place: FW.Xpath('/html/body/div/div[8]/table/tbody/tr[6]/td[2]').text().split(',')[3]
place: FW.Xpath('/html/body/div/div[8]/table/tbody/tr[6]/td[2]').text().split(',').pop()
place : FW.Xpath('concat("", string(//html/body/div/div[8]/table/tbody/tr[6]/td[2]).split(",")[3] )'),

=====================================================================
Here is my current translator code:
http://matameko.wikispaces.com/file/view/Zotero_translator_EU_PVSEC.txt

adamsmith · September 16, 2012

I'll see if I can have a look at 5 - or maybe Aurimas knows off the top of his hat. For 6, .match with a regex is the way to go.

aurimas · September 16, 2012

//div[@id="detailsTable"]//td[./b[text()="DOI:"]]/following-sibling::td
That should do the trick

aurimas · September 16, 2012

For abstract, you may also want to do .replace(/\s+/g,' ') at the end of the chain. While the browser ignores the newlines and extra spaces in the HTML, the abstract field in Zotero does not. For some reason EU PVSEC put a bunch of newlines inside the abstract.

Also, if something isn't working exactly the way you want, you can add a custom filter:
.addFilter(function (s) { return s.split(/,/).pop() })
That returns the last item. Since you want last two, it's a bit more involved. IMO, it would be simpler with .match as adamsmith suggests.