Submissions/Interlanguage links in Wikipedia: current problems and future development/draft
Text draft written by Incnis Mrsi /'ɪnknis mrsi/ with the aid of several ru.WP users at a small wikimeetup (Qkowlew, Claymore, LEMeZza, Kalan), and will be performed by Amir E. Aharoni. Includes ideas from m:Fine interwiki and m:A newer look at the interlanguage link essays. Such distinction indicates an intended spoken text.
current problems and future development
(placeholder for image) |
Video translation impose a delay about 30 seconds.
If you want to send your questions to Incnis Mrsi more quickly, please, use IRC-channel #wikipedia-interwiki at Freenode network.
Объяснить своими словами ситуацию так, как она выглядит оттуда.
- ⇓ The concerns of currently used system.
- ⇓ Possible ways of improvements, entropy reducing and structure extension. Implementations, some of which require changes in the engine, but some (almost) do not.
- ⇓ A time for standards.
- ⇓ Coordination issues, human and technical.
(tell something about it in ≈ 30 s)
Why these links
Technically, any interwiki link is just a special format of hyperlink to another wikiproject. From here and forth we will discuss interlanguage links, which connect similar items (most notably, articles) on different languages. Generally, "interwiki" is more general term, but in this presentation we shall always mean interlanguage links.
We consider interlanguage links as metadata, like categories but in some sense even more metadata than categories. Categories belong to particular wikiproject, interwikis do not. Evidently, inbound interwikis do not belong to a wiki, but in Wikipedia even outbound interlanguage links should obey some constrains due to automated processing.
We shall talk about Wikipedia experience, although it is applicable to another projects. These wikis consist a system, where interlanguage links is the second important metadata structure after categories. #####
End users and editors use interlanguage links in a bit different ways. Readers want to have more informative links. Editors want more regularity, want system to be more predictable.
What do we have now
- Poor traceability
- Lack of editors’ attention
- This includes, but not limited to, omitting interwikis in newly created articles, mistaken links (human and bot errors), moving a page with tampering original name.
- Technical and organizational flaws
- This includes bad algorithms in bots, lack of management etc.
- Conceptual flaws.
With several millions of articles, structure of interwiki faces many challenges. Traditional approach is to build some clusters of articles in different languages, say, to establish an equivalence relation between articles. But current technical implementation is based on links, from one page in one language to another page in another language. There is no easy way to check #####, and there are many cases where the current attitude behaves faulty.

interwiki.py is a component of Pywikipediabot.
It follows redirects indiscriminately.
→ ⇑
Due to enormous volume of this task, bot make most changes in interlanguage links. Most used platform for bots is Pywikipedia. Рассказать что-нибудь хорошее про ботов, если есть желание. Unfortunately, in some situations bot makes mistake itself. For example, Pywikipedia substitutes all redirects, but it does not check the reason of redirect. It is correct way if an article was moved, but almost always bad in other circumstances.
Possible solutions
Generally, there are two ways to improve: to reduce the entropy and to extend the structure. The idea of the former is that simpler system will fail less likely. Second is to make the system do more than was thought initially.
![]() ![]() |
■ ↕ ■↔◉↔⇒# ↕ # | |
structural extension ⇑ | ⇗ | |
![]() |
⇒ | ![]() |
entropy reducing |
---|
Entropy reducing
![]() |
⇒ | ![]() |
A diagram by HenkvD for N=7 illustrates the main idea of this approach | ||
| ||
N·(N−1) links. N edits to add a new language and N−1 edits to remove one. |
Two edits per operation: add/change/remove link in the article and one change in the central database. | |
Queries like http://toolserver.org/~vvv/sulutil.php?user=Amire80 perform slowly due to hundreds of database. The same trouble experience current and possible interwiki tools. | Central database could made some operations simpler. |
Entropy reducing approach is to restrict possible configurations to clusters containing no more than 1 article in any language. This makes a unique (injective) correspondence between articles on any pair of languages. It usually assumes some central interwiki database, like in the proposal of 2008 at Meta-wiki, which is now discussed at strategy wiki.
Another advantage is technical, that querying a centralized database will produce less overhead. We know that such tools as Single Unified Login utility are extremely slow due to about thousand of SQL queries to different wiki databases.
Semantic keys
§aÇ6íkZ.[¨OĽB”ę$ or type: human readable ID?
We should persuade developers do not use random or sequential IDs except for internally in the central database itself.
Will be without a proper structure |
Must be:
|
According to Incnis Mrsi, it is way we should accept for some types of articles.
What keys should we use, just sequential numbers or random data? May these keys semantically significant for any subtype of articles? Persons have birth dates, organizations are registered under some official names, films and books have at least a year of release, astronomical and geographical objects in some cases may be identified by approximate values of coordinates. Also, we should use existing (external) keys as strong as possible, such as Latin names for species, IMDB index for films, CAS registry numbers for substances. We should obtain not only a piece of syntactic sugar for interwiki, but a semantically useful system. If carefully designed, it will deter some mistakes, such as inaccurate translation or disambiguation. Of course, programmers prefer numbers as it is simpler and faster. But this it not the case to cut everything with the Ockham's razor.
- Yes, instead of always using the name in the first language, this is another possibility: use the "most basic" name if possible. Plants and animals could go by the Latin name, chemicals by their chemical formula or CAS number, astronomical objects by their catalogue number, people and places by their name in their language (though the last two may be a point of contention).
(tell something about it)

Many systems are possible even for the same type of articles. Choose one? Translate between several?
Many systems are possible even for the same object type. How to identify countries, say for example, the state of Israel: by its Hebrew name, or as a territory between 29½°–33½° Northern latitude and between 34½°–36° Eastern longitude? Or, may be, international country code would be the best solution?
Extended structure

But there are many cases where equivalence-like interwiki structure on articles is not possible. It may be due to lack of some required articles at the moment, or sometimes it seems to be impossible at all because of semantic incompatibility of different languages. In such cases we should not decrease a complexity.
Causes
- Sometimes an article that refers to a specific concept may correspond to two or more articles in another language. It is not just a problem of translations but of differences in culture, habits, laws, implementations. In such cases it is necessary to disambiguate when moving from a wikipedia to another. A central hub would help, working as a disambiguation page.
(tell something about it)
Non-classical links
What | Currently | Extended | |
---|---|---|---|
# Article section |
→ | Technically impossible interwiki source | {{Section-links}}, experimental |
← | Possible interwiki destination, but some bots delete it | Should be avoided in favor of redirects | |
· | |||
![]() |
→ | Never is interwiki source | Disputable |
← | Redirected interwiki target considered as overhead and followed by bots | RwP should be a legitimate interwiki target |
There are some possible structural extensions which require no changes in MediaWiki, or almost no changes. Two types of MediaWiki objects: redirects and sections are currently underestimated in the context of interlanguage linking.

■
→ ⇑ |
versus | → ■
|
![]() |
Redirect pages (or, colloquially, just redirects) are often considered as aliases for page names. For example, the connectivity project does not see any difference between a direct link and a link through redirect.
It may be simple:
... or ...
... points to a section:
A redirect to section linked from another language. {{section-links}} should be placed in the section.
A question for usabilists.
Conclusion
Walk back to the slide 3.1, move pointer to upper right corner of the table. Some extensions of structure are compatible with entropy reducing, indeed. For example, we should have links from the central hub to redirects, and links from redirects and sections to the central hub.
Standards and technical questions
Optimal | |
Acceptable | |
Errors | |
banned by the engine? |
It is a time to establish some standards for interlanguage links in Wikipedia. First of all, what configurations may be considered acceptable? What is better than acceptable? What structures should be considered incorrect and how to repair it?
#####
Some configurations

If some non-section redirect page with possibility linked from another language, then should it have a corresponding interlanguage link itself? In the case of redirect to section there should be a link from section to provide end user access, and a link from redirect page would be obviously excessive.
+ ←—————— i —————————→ |
■
|
But it is not OK otherwise, and we should fix it.
This is named "triangles" in the presentation in the presentation "Analyzing Interlanguage links" at Wikimania 2008.
A "hook" is a condition when and interwiki link returns to the same language to another point. #####
Some users think that there are.
Some users think that this as an error.
Formal model

Should we restrict to graphs made of elements mentioned above, or we can check possibly more complicated graphs to satisfy some conditions?
Say, any page in the same language reachable via interlanguage links in forward direction must be in some sense a top page for the origin. This is a weakened DAG condition.
We can consider a structure of directed graph, with interlanguage links, redirects and inclusion (section "links" to the article). It forms a binary relation (reachability), but is will not be symmetric, only transitive. This directed graph carries more information than an equivalence relation.
Improved traceability
Should bots report the cause of any interwiki change? How to trace an edit which resulted in that condition?
Automatic error deterrence

May and should we restrict some incorrect (or dangerous) configurations by the engine?
Coordination
- Develop a standard for links itself.
- Separate tasks which may be done purely automatically from ones which implies some responsibility of user or bot owner.
- Educate editors about complexity of this problem, that some changes cannot be easily undone.
- Support a chat (IRC-channel, we propose #wikipedia-interwiki).
As a conclusion, let us speak about tasks for the interwiki community. How to bring interested editors together to discuss these problems? How to establish some standards accepted for all language Wikipedias? How to educate local users (most notably, patrollers/reviewers) that interlanguage links are a complex matter? They may not insert garbage data in outbound links or blindly revert any change which do not understand, because with the current system it may lead to grave consequences.
for your attention!

Wikimedia Commons provided many icons and diagrams which I did not draw myself.

Mozilla Firefox was used as a presentation environment.