Submissions/Open Linked Language Data: Building a Collaborative Database for Hanzi and Kanji

This is an open submission for Wikimania 2010.

Title of the submission

Open Linked Language Data: Building a Collaborative Database for Hanzi and Kanji

Type of submission (workshop, tutorial, panel, presentation)

Presentation

Author of the submission

Christoph Burgmer, Hans-Jörg Happel, Jens Wissmann

E-mail address or username

User:Christoph Burgmer, happel (at) fzi.de, wissmann (at) fzi.de

Country of origin

Germany

Affiliation, if any (organization, company etc.)

FZI Reseach Center for Information Technologies, Karlsruhe, Germany

Personal homepage or blog

http://cburgmer.nfshost.com/

Slides

Abstract

Free knowledge of various kinds has been acknowledged as a basis for innovation and empowering cultural evolution. Besides popular examples such as the Wikipedias or free media metadata, language data is an important foundation. While data collection and use cases are relatively simple for western languages, CJK (Chinese, Japanese, Korean) languages employing Chinese characters (termed Hanzi or Kanji) form a more complex domain needing special care for depiction and pronunciation. While there are scattered "islands" of free data available in this area, we argue for establishing a comprehensive set of interlinked information.

This endeavor however requires a technical platform that a) offers collaborative editing features, b) easy and usable maintenance of structured data and c) powerful means to query and offer free data to the public. Based on these requirements, we chose Semantic MediaWiki (SMW) as a platform for building CharacterDB - an open and free database on the structure of Chinese characters.

In our talk we give a short introduction to modeling challenges of the Chinese language. Afterwards we describe design and implementation challenges for realizing CharacterDB with SMW and describe its core functionality. CharacterDB is currently focusing on capturing information about visual representations (glyphs), their decompositions and stroke orders for writing characters. We finally discuss application scenarios for these data. In particular, we elaborate on the emerging concept of Linked Open Data and how CharacterDB provides information to and consumes information from the Linked Open Data cloud.

Track (People and Community/Knowledge and Collaboration/Infrastructure)

Knowledge and Collaboration Track

Will you attend Wikimania if your submission is not accepted?

sure

Slides or further information (optional)

http://characterdb.cjklib.org/wiki/Main_Page

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

Kocio 00:49, 4 June 2010 (UTC)[reply]