Interlanguage links in Wikipedia: current problems and future development

Innocenti V. Maresin / Иннокентий В. Маресин, a.k.a. Incnis Mrsi

Incnis Mrsi

IrcCity (non-profit)

More complete draft is available at /draft

In modern Wikipedia, interlanguage links (from here and below: interwiki, although interwiki is a term of broader scope) is the second important metadata structure after categories. With several millions of articles, this structure faces many challenges. Current problems can be roughly classified as:

  1. Poor traceability
  2. Lack of editors’ attention
    This includes, but not limited to, omitting interwikis in newly created articles, mistaken links (human and bot errors), moving a page with tampering original name.
  3. Technical and organizational flaws
    This includes bad algorithms in bots, lack of management etc.
  4. Conceptual flaws.

Traditional approach is to build some clusters of articles in different languages, say, to establish an equivalence relation between articles. But current technical implementation is based on links, from one page in one language to another page in another language. Links (i.e. directed graph) have more capabilities than merely grouping pages in clusters, and carry more information than needed to define such equivalence. There are many cases where the current behaves faulty, due to any of three reasons mentioned above. There are two possible major ways to improve it: to reduce the entropy and to increase complexity.

Entropy reducing approach is to restrict possible configurations to clusters containing no more than 1 article in any language. This makes a unique (injective) correspondence between articles on any pair of languages. It usually assumes some central interwiki database, like in the proposal there. To my opinion, it is a good way only for some types of articles. Even in this case, the use of semantically significant keys should be encouraged for any subtype of articles, because some mistakes (as lack of attention) can be avoided.

But there are many cases where equivalence-like interwiki structure is not possible, either due to lack of some required articles at the moment, or sometimes it seems to be impossible at all because of semantic incompatibility of different languages. In such cases another system may be used, based on current directed link paradigm but with some semantic extensions. This will be a structure of directed graph as today, and will also form a binary relation (reachability), but is will not be symmetric, only transitive; see there a detailed explanation. There are also such possibilities as links to redirect pages and from article sections, see there.

Some concerns about coordination (technical and human) also will be considered.

mostly Infrastructure, but also Knowledge and Collaboration

It is accepted and scheduled, but I am not sure that will be able to get in Poland to present it myself.

