ISBN import with Umlaut
There seems to be a problem with Umlaut ö, ä, ü (maybe other characters?) in the ISBN import. The following should be reproducible:
1. Import by ISBN = 9783531149950
2. Quick search for the author "Dörre" -> no result
3. If one retype the author again by hand, then the search will work again.
There might be a problem with the character encoding? Can someone check?
1. Import by ISBN = 9783531149950
2. Quick search for the author "Dörre" -> no result
3. If one retype the author again by hand, then the search will work again.
There might be a problem with the character encoding? Can someone check?
http://xisbn.worldcat.org/webservices/xid/isbn/9783531149950?method=getMetadata&format=xml&fl=*
The output then is
<?xml version="1.0" encoding="UTF-8"?>
<rsp xmlns="http://worldcat.org/xid/isbn/" stat="ok">
<isbn oclcnum="180899327 470849617 77077455" lccn="2007385274" form="BC" year="2006" lang="ger" ed="1. Aufl." title="Im Schatten der Globalisierung : Strukturpolitik, Netzwerke und Gewerkschaften in altindustriellen Regionen" author="Klaus Dörre ; Bernd Röttger. Unter Mitarb. von Birgit Beese." publisher="VS, Verl. für Sozialwiss." city="Wiesbaden" url="http://www.worldcat.org/oclc/180899327?referer=xid">9783531149950</isbn>
</rsp>
This means that WorldCat uses the HTML encoding for Umlaut, e.g. & # 2 4 6 (without the spaces);. Could we replace them by the utf-8 encoding?
http://www.worldcat.org/search?q=isbn:9783531149950&=Search&qt=results_page&client=worldcat.org-detailed_record&page=endnotealt
In any case: isn't this just an issue of precomposed vs. decomposed characters:
https://en.wikipedia.org/wiki/Precomposed_character#Comparing_precomposed_and_decomposed_characters
I find the pre/decomposed characters a bit of a mystery, but Zotero does indeed not handle the decomposed (I believe) ones elegantly, not the first time this comes up.
We could look at the API, that might work better, but would need some testing. I thought that was restricted, but that's clearly not the case.
I think switching this is the right thing to do as long as data quality doesn't suffer. Among other things should also make it faster and use up less bandwidth.
Another thing that I found odd with the xisbn is that the results are at times different - as in a different copy of the book - from the ISBN search on WorldCat
Edit: or, since we're only supplementing, we can just go with OCLC. Or vice versa... ugh. I'll play around with this. If you happen to stumble on some ISBNs with inconsistencies between API and their web page, or where the authors are messed up, post them here.
http://xisbn.worldcat.org/webservices/xid/isbn/0596002815?method=getMetadata&format=xml&fl=*
It will list 19 oclcnum and the metadata shown are from the smallest (?) oclcnum, cf. url.
A search for the ISBN 0596002815 in OpenWorldCat returns 3 entries, where each consists of 2, 104, respectively 1 editions/formats; thus there are a total of 107 editions/formats in OpenWorldCat:
http://www.worldcat.org/search?q=bn:0596002815&qt=advanced&dblist=638
http://xisbn.worldcat.org/webservices/xid/isbn/978-3-939352-01-3?method=getMetadata&format=xml&fl=*
The author-statement is here "von Birgit Grüb. [Universität Mannheim]". Is this actually proper UTF-8? I wouldn't expect something like &# 252; ...
author="von Birgit Gr&# 252;b. [Universit&# 228;t Mannheim]"
Thus, the xml will look good in a browser but not in a text editor. But maybe it is not a big problem...
http://experiment.worldcat.org/oclc/177669176.rdf
http://experiment.worldcat.org/oclc/162462118.rdf
http://worldcat.org/oclc/866798552.rdf
We need the oclc number, but this might be useful... How about a new search translator given a oclc number (maybe as a first step)?
For the xID API, the API seems to not work so well with OCLC numbers. http://xisbn.worldcat.org/webservices/xid/oclcnum/154684429?method=getMetadata&format=xml&fl=* returns essentially no metadata and http://xisbn.worldcat.org/webservices/xid/oclcnum/864085353?method=getMetadata&format=xml&fl=* (the article we discussed on GitHub) returns a server error.
I'm giving up on this for now. Will notify WorldCat about the errors though.