Jump to content

Submissions/Creating offline version of Wiki content - Solutions and Challenges

From Wikimania 2010 • Gdańsk, Poland • July 9-11, 2010


This is an open submission for Wikimania 2010.

Title of the submission
Creating offline version of Wiki content - Solutions and Challenges
Type of submission (workshop, tutorial, panel, presentation)
Author of the submission
  • Shiju Alex
  • Manuel Schneider
  • Santhosh Thottingal
  • Martin Walker
E-mail address or username (if username, please confirm email address in Special:Preferences)
  • Shiju Alex - shijualexonline@gmail.com
  • Martin Walker - walkerma AT potsdam.edu
  • Manuel Schneider - User:80686
  • Santhosh Thottingal - santhosh DOT thottingal AT gmail.com

Country of origin
India, Germany, and USA.
Affiliation, if any (organization, company etc.)

Wikimedia CH (Manuel Schneider)

Personal homepage or blog
Abstract (please use no less than 300 words to describe your proposal)

This proposal is created as a result of the merging of the 3 related submissions. Following are those original submissions.

Following are the details of the workshop that we are planning. (This is an abstarct. More details are coming soon. )

The workshop about Creating offline version of Wiki content is divided into the following four sessions.

  1. Selection
  2. Extraction
  3. Storage
  4. Reading


  • A brief introduction about the workshop,
  • Information about the important points that needs to be taken care while preparing for an offline version,
  • Difference between providing the whole dump of wiki and an offline version of the selected articles, and so on.



Walkerma and Shiju will provide an introduction about the article selection process. Walker will explain how he used the Selection Bot to select articles for the English Wikipedia CD. Shiju will give details about the process used for the Malayalam Wikipedia CD. The pros and cons of both the methods will be discussed.

There are many challenges when it comes to the selection of the articles. We will explore the possible solutions for the challenges. We will be engaging the audience to find better solutions.

The peer review process of the selected articles will also be discussed.


This is the second step of creating offline version. We have at least 2 known tools that can be used for the extraction of the selected articles. Kiwix and Wiki2Cd. Kelson and Santhosh will demonstrate their tools during this session. The challenges and issues will be discussed. We will be discussing about the other known solutions also. The offline review process will also be discussed during this session.


OpenZim will be presented during this session (Manuel Schneider). The issues of the CDFS file system when it comes to non-Latin language scripts, and how it can be bypassed, will also be discussed. The wiki content can be stored in CD/DVD, Mobile, and other Devices (for example, WikiReader).


Storing content to CD or other devices is not enough; we need to consider how the user will read and use it. The workshop will also discuss the challenges of readers with regard to the non-Latin languages.

Track (People and Community/Knowledge and Collaboration/Infrastructure)
Will you attend Wikimania if your submission is not accepted?
Slides or further information (optional)
coming soon


This workshop will examine the complete process needed for producing an offline version of a wiki such as Wikipedia. First, the scope of the collection must be established: will the collection contain only important topics, or articles from a given subject, or contain all articles in that language. After this, the version of each article must be chosen carefully so as to be free of vandalism – an DVD or book cannot be edited after publication! The collection must then be cleaned up and extracted into a format suitable for publication, using software such as Kiwix or Wiki2Cd. The files must be stored in a format that is efficient, portable and readable, and the openZIM format was developed for this purpose. Finally these offline releases may end up distributed in the form of DVDs, flash drives, books or releases for mobiles, and each of these has their own issues. The workshop presenters include expertise in all of the above steps, including the difficulties of rendering non-Latin scripts. There session will include a generous question and answer session.

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Jpatokal 05:26, 7 June 2010 (UTC)[reply]
  2. Laaknor 08:49, 16 June 2010 (UTC)[reply]
  3. Tinucherian 10:38, 29 June 2010 (UTC)[reply]