scraping does not reproduce certain details of source

I have observed that several items scraped from WordCat loose certain details in transmission to zotero. Most notably, diacritics on foreign words seem to disappear, even if they are correctly represented in the WorldCat entry; likewise, I've noticed that the series titles (and numbers) for books do not always get transmitted to zotero.

Is there any way to ensure more faithful reproduction of the source being scraped?

One example where both these things happened in scraping is here: <http://www.worldcat.org/oclc/765821302>.
  • Fixed the non-latin character issue.

    Unfortunately, the series title is not provided by WorldCat in their RIS/EndNote export (top-left Cite/Export link). You could email them about this and request that they add this data (possibly link to this thread). Idk how likely they are to implement this, since it's a bit messy with RIS.

    The book you link to above is a conference publication, which is exported as a conference proceeding. In this case the series title should be under T3 tag.

    For a regular book, this should be under T2 tag.

    All EndNote mappings (which Zotero follows quite closely) can be found here
  • Thanks a lot. I have submitted this as feedback to WorldCat.
  • Sorry, about non-latin characters: have just scraped another item (http://www.worldcat.org/oclc/423293708) and see that ā still does not get transmitted to Zotero.
  • works for me. Are you sure you have updated your translators?
  • how would I do that?
  • there's an "update now" button in the general tab of the Zotero preferences. Click it. Make sure you reload the page on worldcat before trying again.

This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.

Sign In or Register to comment.