Japanese characters disappear from bibliographic record

Hi there,

I was trying to import records from worldcat, which include Japanese characters like the following -- this would save me the trouble of typing them.

http://www.worldcat.org/oclc/29758457&referer=brief_results

However, Zotero just ignores the characters altogether and truncates the record. Is there a way to work around this? Am I doing something wrong?

wondering in Kyoto,

Chris
  • This is a perfect example of what I was earlier mentioning about storing names and titles in multiple languages. Since Zotero isn't (yet) designed to accommodate this, perhaps the translator simply ignores that content (which is surrounded by an HTML element with appropriate language attribute).
  • Bruce is right that Zotero ultimately needs to be able to handle multiple titles / names per item, but the problem with WorldCat is somewhat more complicated: they're not embedding any Japanese characters in the structured metadata associated with this item (here a COinS tag), only the Romanized names.
  • Ah, right. But given the limitations of COinS, can they even do anything about that?
  • COinS can accommodate UTF-8, but you wouldn't be able to separate alternate encoding for names in one span.

    You can either merge both the kanji and romanized fields together (as the page visually does) or can have two separate COinS (one for the romanization & one for the original character set).
  • I would think that Bruce's suggestion fits my bill: I would need to have both types of names in my records (and for titles, I sometimes even add a translation). Currently I am adding this although Zotero than considers this as additional authors, which is not ideal, but at least gives me the information I need. I would definitely like to see improvement here as soon as possible -- not allowing this would be a major hindrance for the adaption of Zotero in this part of the world,

    Chris from snowy Kyoto
  • Right...but just because a future version of Zotero might support something like that doesn't mean that all site translators will easily be able to support it too. COinS won't work that way & (assuming no help from WorldCat), one would have to go back to page scraping. This wouldn't be THAT bad to do on WorldCat--everything is in separate spans & the kanji is differentiated from the rōmaji by being in a separate "vernacular" span. But that doesn't necessarily mean that it overcomes the defects of page scraping (having to update when the site changes layout, having to write tons of these one-site scrapers, etc.)

    Is it common to use BOTH kanji/kana AND rōmaji in the same citation? If not, perhaps one can store two separate citations & then use the future feature of semantic relationships to relate the records--this sure would simplify the interface from having three (or more?) copies of every field (and supporting the ability to select the language/writing system that each field stores). Just brainstorming--if it is common to mix writing systems, perhaps someone has other ideas...
  • Is it common to use BOTH kanji/kana AND rōmaji in the same citation?
    Yes. In fields like Japanese history, it's the norm I believe.
  • There's a ticket for having multiple versions of fields in an item, by the way, though the problems raised by sean and noksagt would still apply.
  • I have only just begun looking at Zotero, and I too am concerned about getting CJK data into my bibliographic records. (I work with East Asian Buddhist materials, hence there is plenty of Chinese and Japanese, plus lesser amounts of Korean and even Vietnamese sprinkling my bibliographies.)

    Here are two records of the same book, the first downloaded and pasted into a Google Docs document as is, and the following as edited by me by adding data from the WorldCat listing and formatting a bit:

    # Satoshi Kawano and 河野訓, Shoki Kanʹyaku Butten No Kenkyū: Jikuhōgo O Chūshin to Shite (Ise: Kōgakkan Daigaku Shuppanbu, 2006).

    # Kawano Satoshi 河野訓, Shoki kanʹyaku butten no kenkyū: Jiku Hōgo o chūshin to shite 初期漢訳仏典の研究 : 竺法護を中心として (Ise: Kōgakkan Daigaku shuppanbu 皇學館大學出版部, 2006)

    The problems with the initial listing are that the author's romanized and Japanese script versions are identified as different authors and, less significantly, the capitalization in the title does not suit my fancy. In the second I also removed the italics from the Japanese title in Google Docs; when I pasted this into the title field in the Zotero entry and then pasted into Google Docs it italicized the entire field...

    Hmm, all this may not help anyone immediately, but it should give some impression of what refinements would be necessary to satisfy a picky-picky kind of guy like me!

    -- John McRae
  • How did you get the original citation? Some of these issues are site-specific & it doesn't a ppear that you got it from:
    http://www.worldcat.org/oclc/70233388&referer=one_hit
    As this produces:

    1. Satoshi Kawano, Shoki kanʼyaku butten no kenkyū : jikuhōgo o chūshin to shite. (Ise: Kōgakkandaigakushuppanbu, 2006).

    e.g.: it contains no Japanese script & has different capitalization.
Sign In or Register to comment.