scraping does not reproduce certain details of source

arlogriffiths · August 28, 2013

I have observed that several items scraped from WordCat loose certain details in transmission to zotero. Most notably, diacritics on foreign words seem to disappear, even if they are correctly represented in the WorldCat entry; likewise, I've noticed that the series titles (and numbers) for books do not always get transmitted to zotero.

Is there any way to ensure more faithful reproduction of the source being scraped?

One example where both these things happened in scraping is here: <http://www.worldcat.org/oclc/765821302>.

aurimas · August 28, 2013

Fixed the non-latin character issue.

Unfortunately, the series title is not provided by WorldCat in their RIS/EndNote export (top-left Cite/Export link). You could email them about this and request that they add this data (possibly link to this thread). Idk how likely they are to implement this, since it's a bit messy with RIS.

The book you link to above is a conference publication, which is exported as a conference proceeding. In this case the series title should be under T3 tag.

For a regular book, this should be under T2 tag.

All EndNote mappings (which Zotero follows quite closely) can be found here

arlogriffiths · August 29, 2013

Thanks a lot. I have submitted this as feedback to WorldCat.

arlogriffiths · August 29, 2013

Sorry, about non-latin characters: have just scraped another item (http://www.worldcat.org/oclc/423293708) and see that ā still does not get transmitted to Zotero.

adamsmith · August 29, 2013

works for me. Are you sure you have updated your translators?

arlogriffiths · August 29, 2013

how would I do that?

adamsmith · August 29, 2013

there's an "update now" button in the general tab of the Zotero preferences. Click it. Make sure you reload the page on worldcat before trying again.