From Wikimania 2010 • Gdańsk, Poland • July 9-11, 2010
Semantic Search on Heterogeneous Wiki Systems

Fabrizio Orlandi

Digital Enterprise Research Institute (DERI), National University of Ireland, Galway.

We propose a system to enable semantic search across heterogeneous wikis in a unified way using Semantic Web technologies. In particular, we detail (i) how we designed a common model for representing social and structural wiki features and (ii) how we extracted semantic data from wikis using two relevant wiki engines. On this basis, we show how we built and efficient application with a simple user-interface enabling semantic searching and browsing capabilities on the top of different interlinked wikis.

Wikis are widely used both on the Web - with well-known and popular systems such as Wikipedia [1] or Wikitravel [2], as well as wiki systems dedicated to open-source software management such as Trac - and in the workplace, for instance for project management or customer relationships. However, each wiki system relies on its own data structure and API to model its data and let developers access it. Consequently, wikis act as isolated systems, where information from one system cannot be easily integrated with information from another one. Practically, this introduces several drawbacks when users need to access information on the Web or in the enterprise. Let us for instance consider the ACME company: its marketing department uses a MediaWiki-based system, its engineering team uses MoinMoin and the sales are using DokuWiki. Hence, in case a user wants to retrieve all information about a particular project, he has to separately query each wiki, which can be time consuming. Similar issues happen on the Web, if for example one wants to identify all the contribution of someone across several wikis. We propose an approach based on Semantic Web technologies, Linked Data principles and lightweight ontologies to solve such issues and to enable semantic search across heterogeneous wiki systems. Especially, our contributions include:

  • a common semantic model for representing wiki structure and contributions in RDF - Resource Description Framework - encompassing previous models in the area;
  • various exporters for popular wiki systems, translating wiki information in RDF annotations (based on the previous model) in real-time;
  • a semantic search engine which provides means to retrieve information contained in heterogeneous wikis in a novel and user-friendly way.

Our work has been developed following these steps. First, we defined the structure of the ontology we propose for representing the social and structural features of wikis. Typically wikis allow editing of documents and, by definition, allow multiple users to simultaneously contribute to the content; they track history of changes so that pages can be restored to previous modified versions; they include comments or discussion areas; they link to other external sources or within the wiki; they describe categories into hierarchical structures. We modeled each of these features on the top of the SIOC ontology. SIOC - Semantically-Interlinked Online Communities - is now considered as one of the building blocks of the Social Semantic Web, therefore building on the top of it allows exposure of our model in existing applications. Second, we developed two software applications (two exporters) that extract and produce semantic data from two different popular wiki systems (MediaWiki and DokuWiki) using our proposed model. The design criteria of a webservice exporter [3] for the MediaWiki platform, and a plug-in for DokuWiki, have been made following the Linked Data principles in order to provide interlinked data. The third step has been to collect data from these two platforms across five wiki sites using our developed exporters. In total, we collected about 1GB of RDF data and loaded it in a RDF-store with a reasoning engine and a querying interface. Finally, on the top of the data extracted from the aforementioned exporters, a semantic search system has been built. It provides a user-friendly interface and advanced features to retrieve information contained in heterogeneous wikis in a unified way. It allows cross-wikis querying and faceted browsing capabilities. This application is built on existing Semantic Web querying standards (i.e. SPARQL) and uses PHP and the MIT's SIMILE Exhibit web interface. Despite its simplicity, the presented application allows for advanced and fast querying processes and hidden knowledge discovery, showing potentialities that cannot be obtained using the traditional Web 2.0 instruments. To conclude, by developing a lightweight Semantic Web application we demonstrate the capabilities of semantic technologies applied to wikis. Hence we show an overall benefit on applying Semantic Web technologies to wikis, enabling users to access the information generated by this process in a simple, interoperable and transparent way.

Knowledge and Collaboration

1 F. Orlandi and A. Passant, "Enabling cross-wikis integration by extending the SIOC ontology," 4th Semantic Wiki Workshop (SemWiki 2009), CEUR-WS, 2009.

2 F. Orlandi, "Using and extending the SIOC ontology for a fine-grained wiki modeling",, Master's thesis, 2008/2009.


short introduction (slides):

