Submissions/Interlanguage links in Wikipedia: current problems and future development

From Wikimania 2010 • Gdańsk, Poland • July 9-11, 2010
Jump to: navigation, search


Information

This is an open submission for Wikimania 2010.


Title of the submission

Interlanguage links in Wikipedia: current problems and future development

Type of submission (workshop, tutorial, panel, presentation)

presentation

Author of the submission

Innocenti V. Maresin / Иннокентий В. Маресин, a.k.a. Incnis Mrsi

E-mail address or username (if username, please confirm email address in Special:Preferences)

Incnis Mrsi

Country of origin

Russia

Affiliation, if any (organization, company etc.)

IrcCity (non-profit)

Personal homepage or blog

none

Abstract (please use no less than 300 words to describe your proposal)
More complete draft is available at /draft

In modern Wikipedia, interlanguage links (from here and below: interwiki, although interwiki is a term of broader scope) is the second important metadata structure after categories. With several millions of articles, this structure faces many challenges. Current problems can be roughly classified as:

  1. Poor traceability
  2. Lack of editors’ attention
    This includes, but not limited to, omitting interwikis in newly created articles, mistaken links (human and bot errors), moving a page with tampering original name.
  3. Technical and organizational flaws
    This includes bad algorithms in bots, lack of management etc.
  4. Conceptual flaws.

Traditional approach is to build some clusters of articles in different languages, say, to establish an equivalence relation between articles. But current technical implementation is based on links, from one page in one language to another page in another language. Links (i.e. directed graph) have more capabilities than merely grouping pages in clusters, and carry more information than needed to define such equivalence. There are many cases where the current behaves faulty, due to any of three reasons mentioned above. There are two possible major ways to improve it: to reduce the entropy and to increase complexity.

Entropy reducing approach is to restrict possible configurations to clusters containing no more than 1 article in any language. This makes a unique (injective) correspondence between articles on any pair of languages. It usually assumes some central interwiki database, like in the proposal there. To my opinion, it is a good way only for some types of articles. Even in this case, the use of semantically significant keys should be encouraged for any subtype of articles, because some mistakes (as lack of attention) can be avoided.

But there are many cases where equivalence-like interwiki structure is not possible, either due to lack of some required articles at the moment, or sometimes it seems to be impossible at all because of semantic incompatibility of different languages. In such cases another system may be used, based on current directed link paradigm but with some semantic extensions. This will be a structure of directed graph as today, and will also form a binary relation (reachability), but is will not be symmetric, only transitive; see there a detailed explanation. There are also such possibilities as links to redirect pages and from article sections, see there.

Some concerns about coordination (technical and human) also will be considered.


Track (People and Community/Knowledge and Collaboration/Infrastructure)

mostly Infrastructure, but also Knowledge and Collaboration

Will you attend Wikimania if your submission is not accepted?

It is accepted and scheduled, but I am not sure that will be able to get in Poland to present it myself.

Slides or further information (optional)


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. El Ágora 21:34, 20 May 2010 (UTC)
  2.  guillom 01:35, 21 May 2010 (UTC)
  3. Shijualex 03:00, 21 May 2010 (UTC)
  4. Ravidreams 15:15, 2 June 2010 (UTC)
  5. Kocio 22:25, 2 June 2010 (UTC)
  6. Waldir 08:06, 16 June 2010 (UTC)
  7. Laaknor 08:51, 16 June 2010 (UTC)
  8. Karol007 21:19, 16 June 2010 (UTC)
  9. Amir E. Aharoni 06:29, 22 June 2010 (UTC)
  10. Jon Harald Søby 18:25, 26 June 2010 (UTC)
  11. Natbrown 13:28, 29 June 2010 (UTC)