Submissions/Comparing the structure of tagging in a protein-protein interaction network, a co-authorship network and the English Wikipedia

From Wikimania 2010 • Gdańsk, Poland • July 9-11, 2010
Jump to: navigation, search


This is an open submission for Wikimania 2010.

Title of the submission

Comparing the structure of tagging in a protein-protein interaction network, a co-authorship network and the English Wikipedia

Type of submission (workshop, tutorial, panel, presentation)


Authors of the submission

Gergely Palla, Illes J. Farkas (presenting), Peter Pollner, Imre Derenyi, Tamas Vicsek

E-mail address or username (if username, please confirm email address in Special:Preferences)

Illes Farkas:

Country of origin


Affiliation, if any (organization, company etc.)

Statistical and Biological Physics Group of the Hungarian Academy of Sciences and

Eotvos University, Budapest, Hungary

Personal homepage or blog

Abstract (please use no less than 300 words to describe your proposal)

This (proposed) presentation considers three networks (nodes connected with links). A protein-protein interaction (PPI) network (nodes are yeast proteins and links are known protein-protein interactions), a co-authorship network (nodes are scientists and links indicate co-authorship), and articles of the English Wikipedia (nodes are articles, links are hyperlinks). In each of the three networks nodes are tagged. In the PPI network nodes (proteins) are tagged with the biological processes they are known to participate in, in the co-authorship network published articles (and thereby the authoring scientists themselves) receive classification tags, while Wikipedia articles (nodes) are tagged with categories at the bottom of the page.

First, in all three cases we investigated the usage frequencies of node tags (e.g., the categories of Wikipedia articles) and found that it is not possible to clearly separate frequently used tags from less frequently used tags. In this study we found that the transition between popular tags and less popular ones is — without exception — continuous. Second, we analyzed the uniqueness (u) of each node in the three networks. A high value of u indicates (roughly) that the tags of the given node appear rarely in the network. We found that in the co-authorship network nodes with large neighbor numbers (i.e., authors who have many co-authors) tend to have a large set of node tags (research topics), while in the other two networks (proteins and Wikipedia) the tags of nodes with many neighbors tend to remain within fewer, more focused topics.

Third, as a practical application, we investigated a different network: the hierarchy of node tags in the above three cases. In these cases nodes are tags (i.e., categories of the English Wikipedia) and an A→B directed link represents that the tag B is contained by A. In the Wikipedia this means that B is a subcategory of A. We found that in contrast to the other two cases, the node tag hierarchy of the English Wikipedia contains many loops. Removing these loops by removing the smallest number of of A→B subcategory connections can create a directed acyclic graph (DAG) of the categories that could boost possible artificial intelligence-based analyses of the tag structure. We suggested, implemented and carried out an algorithm for constructing this DAG most efficiently: (i) we identified the minimal subgraph of the category hierarchy containing all loops, (ii) we computed the directed edge betweenness centrality of the links of this subgraph (this quantifies how many paths of the category hierarchy pass through the "A→B subcategory" link) (iii) we removed the smallest possible number of such A→B links.

This abstract is based on Palla, Fundamental statistical features and self-similar properties of tagged networks, New Journal of Physics (2008) vol. 10, p. 123026. All data are free and can be downloaded from

Track (People and Community/Knowledge and Collaboration/Infrastructure)

Knowledge and Collaboration

Will you attend Wikimania if your submission is not accepted?


Slides or further information (optional)

Comparing the structure of tagging in a protein-protein interaction network, a co-authorship network and the English Wikipedia.pdf

Fij 16:11, 20 May 2010 (UTC)

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. El Ágora 21:18, 20 May 2010 (UTC)
  2. Incnis Mrsi 11:19, 21 May 2010 (UTC)
  3. --Victoria 12:41, 21 May 2010 (UTC)
  4. Bdamokos 15:24, 25 May 2010 (UTC)
  5. Psychology 12:28, 30 May 2010 (UTC)
  6. Kocio 13:13, 2 June 2010 (UTC)
  7. GlimmerPhoenix 18:20, 10 June 2010 (UTC)
  8. Jérôme 14:03, 15 June 2010 (UTC)