Possible Solution for Primo Bug

zuphilip · August 8, 2013

The primo translator https://github.com/zotero/translators/blob/master/Primo.js is not working for articles because the PNX was not accessible for PrimoCentral. Mehmet Celik from the KU Leuven shows a way, how it is possible for libraries to access the PNX also for PrimoCentral data:

http://www.exlibrisgroup.org/display/PrimoCC/showPNX+Revisited

This solution works with an jsp-file on the primo-server and a little bookmarklet. I think it should now also be possible to use this jsp-file for the primo-translator for zotero. What do you think?

adamsmith · August 8, 2013

For libraries that have the showPNX.jsp on their server, yes, we should be able to take advantage of that.

zuphilip · August 12, 2013

I tried to adapt the translator, see:

https://dl.dropboxusercontent.com/u/59474281/Primo-with-jsp.js

This is the "normal" Primo.js where there is some more code for primo implementations with the jsp file. The whole idea is here that I suggest to provide *one* translator for all primo sites: If the translator is invoked on a primo site with the jsp-file then there will be some code for that. If the translator is invoked on a primo site without the jsp-file then it will work as before.

How does this look for you?

adamsmith · August 17, 2013

just getting back from vacations - will take me a while to get to this, but should be possible to wrap into one translator. Do you have a couple of sample primos with the JSP file installed?

zuphilip · August 18, 2013

Primos with the JSP file installed that I found out:
(1) http://purdue-primo-prod.hosted.exlibrisgroup.com/primo_library/libweb/action/search.do?vid=PURDUE
(2) http://primo.bib.uni-mannheim.de/primo_library/libweb/action/search.do?vid=MAN_UB
(3) http://limo.libis.be/primo_library/libweb/action/search.do?vid=LIBISnet&fromLogin=true
(4.a) http://virtuose.uqam.ca/primo_library/libweb/action/search.do?vid=UQAM
(4.b) http://eudoxe.bib.uqam.ca:1701/primo_library/libweb/action/search.do?vid=UQAM&institution=UQAM

Especially, the first two are interesting, because they contain the articles from the PrimoCentral database. The adapted primo translator worked on all tests I tried so far (Primos with and w/o the JSP file). Please let me know if I can help somehow.

adamsmith · September 1, 2013

looks great to me. Did some clean-up, but took your code verbatim. Pull request is here:
https://github.com/zotero/translators/pull/617
might take a bit until it gets accepted, the guy who does most of the reviewing just went on vacations.

adamsmith · September 1, 2013

oh - and let me know if you want credit - i.e. be added to the translator creators or mention in the code, happy to do that.

zuphilip · September 2, 2013

Thank you very much for the support! It is good to hear from you that it works. I can wait for some time, to see the feature in the official version. If possible, I would take the credit, also I didn't do much. Maybe, I can try in the future to help to improve the translator further.

adamsmith · September 2, 2013

Happy to give credit: what name should I use?

zuphilip · September 2, 2013

[Thank you!]

adamsmith · September 5, 2013

There are still some problems.
Go to the Oxford Primo at http://solo.bodleian.ox.ac.uk/
and search for
Test Valley street map 2006/2007
That item won't import with your modification - l.150-152
https://github.com/zotero/translators/pull/617/files#L0R150
break the XML.

I'll want to keep the CDATA and clean this up on import further down - it's just too fragile - but probably do want to remove the prim: part. Is there anything other than prim: you're removing with that?

zuphilip · September 5, 2013

My guess is that the showPNX.jsp in the Oxford Primo has some errors, because the XML it creates begins with:

i<?xml version="1.0" encoding="UTF-8"?>

The XML breaks downs because it begins with an "i" and not with the XML declaration.

Here is an example of the XML produces by the jsp file (without errors):
https://dl.dropboxusercontent.com/u/59474281/000871081.xml

Possible namespaces are "prim" and maybe "sear" (I can't remember if there were more).

The cleanup in the lines 148, 149, 151, 152, 153 are critical that the translator is doing something. If you feel that the general regular expression is too critical, we could replace it by more specific one(s).

zuphilip · September 5, 2013

Update: Oxford corrected two errors in their JSP-file ("i" on line 1 and "%" on line 3). Still, there seems to be some problems. Maybe, I can look at these problems a little later...

adamsmith · September 5, 2013

The cleanup in the lines 148, 149, 151, 152, 153 are critical that the translator is doing something. If you feel that the general regular expression is too critical, we could replace it by more specific one(s).

that's what I mean. I'm in particular concerned with 151 and 152. I realize we need those, but I'd like them to be a lot stricter. Restricing them to just prim and sear would work. Alternatively we could leave them out and include prim and sear as a ns declaration.

zuphilip · September 5, 2013

Also the more restricted version as you described above is fine for me!

I found out what the problem with Oxford is. For example the following line in the pnx make problems:

<prim:lln03><![CDATA[$$Uhttp://books.google.com/books?lr=&as_drrb_is=q&as_minm_is=0&as_miny_is=&as_maxm_is=0&as_maxy_is=&q=intitle:Thetis+&as_brr=0&as_pt=ALLTYPES&sa=N&start=0$$DSearch for this title in Google Book Search]]></prim:lln03>

Inside the CDATA there are some elements which contradict to the (strict?) XML specification and this is also the problem that the translator will not be able to extract anything. One would have to demasking those elements. Suggestions?

adamsmith · September 5, 2013

So, for line 151/152 could we simply be using

text = text.replace(/\<(prim|sear):([^\>]*)/g, "<$2"); 
text = text.replace(/\<\/(prim|sear):([^\>]*)/g, "</$2");

As for the CDATA - do we actually need to remove that for the XML to parse? I don't think so, right? If not, we could just remove this further down, something along the line of
item.title = item.title.replace(....)

aurimas · September 5, 2013

All characters are allowed inside CDATA. We shouldn't be stripping off the CDATA part. xpath's are smart enough to work around it.

aurimas · September 5, 2013

textContent should return the text inside CDATA

Edit: sorry, I meant text() in xpath

zuphilip · September 6, 2013

@adamsmith: Yes, I think we should use your suggestion for lines 151/152. An example where my general approach went wrong, is actually the GoogleBooks url from Oxford Primo (see 4 posts above).

The CDATA is also present in the pnx file (of library records) at the moment which one will get by adding '&showPnx=true'. The translator can deal with that at the moment. Therefore, you are right and we don't have to clean this up before using some xpath expression.

adamsmith · September 6, 2013

OK, we'll try to get to this over the weekend

adamsmith · September 14, 2013

The new translator is now up. Your version of Zotero will automatically update within 24hs, or you can update manually using the "Update Now" button in the "General" tab of the Zotero preferences.

This will work with translators equipped with the showPnx.jsp file, fix some other issues, mostly with Primo 4 installations, and frequently provide more/better data for articles.

zuphilip · September 15, 2013

Thank you very much! This looks really nice now!

Now, in my tests the journal title is not transformed to zotero via the translator. The line 301:

item.publication = ZU.xpathText(doc, '//addata/jtitle');
is responsible for that. The XPath looks good, but is item.publication or is it item.publicationTitle?

adamsmith · September 15, 2013

good catch, it should be publicationTitle. I'll fix that later.

aurimas · September 15, 2013

Fixed. Same update instructions as above.

c21 · November 15, 2013

Hi,
any news about this issue? I still can't import Primo record via Firefox. However, Chrome/Safari Connectors import Primo record pretty good to my Firefox Zotero library. (Stand-alone version works also.)

zuphilip · November 15, 2013

This issue is fixed and the translator for Primo works (here) for my firefox very good. Are you using special settings or other plugins in your firefox? Can you give a concrete example (URL of the Primo instance) and search term where you encountered problems?

c21 · November 15, 2013

Primo installation:
http://search.obvsg.at/ACC

Search for e.g. 'zotero' - take a record you like.

zuphilip · November 15, 2013

I can save all 7 records by one click with firefox ;-)

c21 · November 15, 2013

I've downloaded Zotero again, but it does not work (I've tested serveral Primo installations) --> 'translator error' with link to:
http://www.zotero.org/support/known_translator_issues - 'Primo: Most Primo catalogs will fail when trying to import articles.'
I've talked to my colleagues - they have the same problem.

zuphilip · November 15, 2013

Please try to search for a record in the OBVSG and go (in the same browser window) to the link: http://search.obvsg.at/primo_library/libweb/showPNX.jsp?id=0 . Do you see the data?

c21 · November 15, 2013

Yes, I see the PNX record (XML).