Submissions/Semantic Search on Heterogeneous Wiki Systems

From Wikimania 2010 • Gdańsk, Poland • July 9-11, 2010
Jump to: navigation, search


This is an open submission for Wikimania 2010.

Title of the submission

Semantic Search on Heterogeneous Wiki Systems

Type of submission (workshop, tutorial, panel, presentation)


Author of the submission

Fabrizio Orlandi

E-mail address or username (if username, please confirm email address in Special:Preferences)


Country of origin


Affiliation, if any (organization, company etc.)

Digital Enterprise Research Institute (DERI), National University of Ireland, Galway.

Personal homepage or blog

Abstract (please use no less than 300 words to describe your proposal)

We propose a system to enable semantic search across heterogeneous wikis in a unified way using Semantic Web technologies. In particular, we detail (i) how we designed a common model for representing social and structural wiki features and (ii) how we extracted semantic data from wikis using two relevant wiki engines. On this basis, we show how we built and efficient application with a simple user-interface enabling semantic searching and browsing capabilities on the top of different interlinked wikis.

Wikis are widely used both on the Web - with well-known and popular systems such as Wikipedia [1] or Wikitravel [2], as well as wiki systems dedicated to open-source software management such as Trac - and in the workplace, for instance for project management or customer relationships. However, each wiki system relies on its own data structure and API to model its data and let developers access it. Consequently, wikis act as isolated systems, where information from one system cannot be easily integrated with information from another one. Practically, this introduces several drawbacks when users need to access information on the Web or in the enterprise. Let us for instance consider the ACME company: its marketing department uses a MediaWiki-based system, its engineering team uses MoinMoin and the sales are using DokuWiki. Hence, in case a user wants to retrieve all information about a particular project, he has to separately query each wiki, which can be time consuming. Similar issues happen on the Web, if for example one wants to identify all the contribution of someone across several wikis. We propose an approach based on Semantic Web technologies, Linked Data principles and lightweight ontologies to solve such issues and to enable semantic search across heterogeneous wiki systems. Especially, our contributions include:

  • a common semantic model for representing wiki structure and contributions in RDF - Resource Description Framework - encompassing previous models in the area;
  • various exporters for popular wiki systems, translating wiki information in RDF annotations (based on the previous model) in real-time;
  • a semantic search engine which provides means to retrieve information contained in heterogeneous wikis in a novel and user-friendly way.

Our work has been developed following these steps. First, we defined the structure of the ontology we propose for representing the social and structural features of wikis. Typically wikis allow editing of documents and, by definition, allow multiple users to simultaneously contribute to the content; they track history of changes so that pages can be restored to previous modified versions; they include comments or discussion areas; they link to other external sources or within the wiki; they describe categories into hierarchical structures. We modeled each of these features on the top of the SIOC ontology. SIOC - Semantically-Interlinked Online Communities - is now considered as one of the building blocks of the Social Semantic Web, therefore building on the top of it allows exposure of our model in existing applications. Second, we developed two software applications (two exporters) that extract and produce semantic data from two different popular wiki systems (MediaWiki and DokuWiki) using our proposed model. The design criteria of a webservice exporter [3] for the MediaWiki platform, and a plug-in for DokuWiki, have been made following the Linked Data principles in order to provide interlinked data. The third step has been to collect data from these two platforms across five wiki sites using our developed exporters. In total, we collected about 1GB of RDF data and loaded it in a RDF-store with a reasoning engine and a querying interface. Finally, on the top of the data extracted from the aforementioned exporters, a semantic search system has been built. It provides a user-friendly interface and advanced features to retrieve information contained in heterogeneous wikis in a unified way. It allows cross-wikis querying and faceted browsing capabilities. This application is built on existing Semantic Web querying standards (i.e. SPARQL) and uses PHP and the MIT's SIMILE Exhibit web interface. Despite its simplicity, the presented application allows for advanced and fast querying processes and hidden knowledge discovery, showing potentialities that cannot be obtained using the traditional Web 2.0 instruments. To conclude, by developing a lightweight Semantic Web application we demonstrate the capabilities of semantic technologies applied to wikis. Hence we show an overall benefit on applying Semantic Web technologies to wikis, enabling users to access the information generated by this process in a simple, interoperable and transparent way.

Track (People and Community/Knowledge and Collaboration/Infrastructure)

Knowledge and Collaboration

Will you attend Wikimania if your submission is not accepted?


Slides or further information (optional)


1 F. Orlandi and A. Passant, "Enabling cross-wikis integration by extending the SIOC ontology," 4th Semantic Wiki Workshop (SemWiki 2009), CEUR-WS, 2009.

2 F. Orlandi, "Using and extending the SIOC ontology for a fine-grained wiki modeling",, Master's thesis, 2008/2009.


short introduction (slides):

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Incnis Mrsi 21:14, 20 May 2010 (UTC)
  2. El Ágora 21:26, 20 May 2010 (UTC)
  3. Jeroen De Dauw 11:28, 25 May 2010 (UTC)
  4. --Friedel Völker 15:31, 28 May 2010 (UTC)
  5. Kocio 13:32, 2 June 2010 (UTC)