Jump to content

Submissions/Open Linked Language Data: Building a Collaborative Database for Hanzi and Kanji

From Wikimania 2010 • Gdańsk, Poland • July 9-11, 2010


This is an open submission for Wikimania 2010.

Title of the submission

Open Linked Language Data: Building a Collaborative Database for Hanzi and Kanji

Type of submission (workshop, tutorial, panel, presentation)


Author of the submission

Christoph Burgmer, Hans-Jörg Happel, Jens Wissmann

E-mail address or username

User:Christoph Burgmer, happel (at) fzi.de, wissmann (at) fzi.de

Country of origin


Affiliation, if any (organization, company etc.)

FZI Reseach Center for Information Technologies, Karlsruhe, Germany

Personal homepage or blog




Free knowledge of various kinds has been acknowledged as a basis for innovation and empowering cultural evolution. Besides popular examples such as the Wikipedias or free media metadata, language data is an important foundation. While data collection and use cases are relatively simple for western languages, CJK (Chinese, Japanese, Korean) languages employing Chinese characters (termed Hanzi or Kanji) form a more complex domain needing special care for depiction and pronunciation. While there are scattered "islands" of free data available in this area, we argue for establishing a comprehensive set of interlinked information.

This endeavor however requires a technical platform that a) offers collaborative editing features, b) easy and usable maintenance of structured data and c) powerful means to query and offer free data to the public. Based on these requirements, we chose Semantic MediaWiki (SMW) as a platform for building CharacterDB - an open and free database on the structure of Chinese characters.

In our talk we give a short introduction to modeling challenges of the Chinese language. Afterwards we describe design and implementation challenges for realizing CharacterDB with SMW and describe its core functionality. CharacterDB is currently focusing on capturing information about visual representations (glyphs), their decompositions and stroke orders for writing characters. We finally discuss application scenarios for these data. In particular, we elaborate on the emerging concept of Linked Open Data and how CharacterDB provides information to and consumes information from the Linked Open Data cloud.

Track (People and Community/Knowledge and Collaboration/Infrastructure)

Knowledge and Collaboration Track

Will you attend Wikimania if your submission is not accepted?


Slides or further information (optional)


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Kocio 00:49, 4 June 2010 (UTC)[reply]