WordCloud: a Cytoscape plugin to create a visual semantic summary of networks
© Oesper et al; licensee BioMed Central Ltd. 2011
Received: 23 February 2011
Accepted: 7 April 2011
Published: 7 April 2011
When biological networks are studied, it is common to look for clusters, i.e. sets of nodes that are highly inter-connected. To understand the biological meaning of a cluster, the user usually has to sift through many textual annotations that are associated with biological entities.
The WordCloud Cytoscape plugin generates a visual summary of these annotations by displaying them as a tag cloud, where more frequent words are displayed using a larger font size. Word co-occurrence in a phrase can be visualized by arranging words in clusters or as a network.
WordCloud provides a concise visual summary of annotations which is helpful for network analysis and interpretation. WordCloud is freely available at http://baderlab.org/Software/WordCloudPlugin
The WordCloud plugin implements a visual information retrieval system known as a tag cloud. Tag cloud systems are used in a variety of domains from social bookmarking services  to summarization of PubMed database searches . The WordCloud implementation extends the basic tag cloud concept of a simple collection of words by also displaying information about word co-occurrence [8, 9].
Methods and Implementation
where sel w is the number of selected nodes that contain the word w, sel tot is the total number of selected nodes, net w is the number of nodes in the entire network that contain the word w, net tot is the total number of nodes in the network, and k is the network normalization coefficient, which can be tuned by the user through an interactive slider bar.
Each word starts in its own cluster. Next, the most similar word pair is merged to form a larger cluster, maintaining word order, and the process is repeated. Similarity between multi-word clusters is defined as the similarity of the last word appearing in the first cluster and the first word appearing in the second cluster. This helps maintain the order of words in the cluster in the standard left to right English text direction. The cluster merging process is bounded by a user-defined threshold on the word pair similarity score.
This is the L2 norm (i.e. Euclidean length) of the cluster's word size vector.
The greedy clustering algorithm described above does not consider the co-occurrence of all word pairs in the input text. Thus, as an alternative to the clustered layout, words can be visualized as a similarity network. Each word is represented as a node, with node and label size proportional to word frequency as previously described. Words are connected by edges whose width is proportional to their similarity score, as defined above. The resulting network can be laid out, analyzed and clustered using Cytoscape functionalities. The network layout is particularly useful when words tend to have multiple co-occurrence partners, rather than a single one.
WordCloud is a configurable tool for creating quick visual summaries of sub-networks within Cytoscape and is a useful tool to aid interactive network exploration. The configuration options provide a high degree of control over tag cloud visualization resulting in a publication quality summary of a sub-network. WordCloud also includes clustered tag cloud and word similarity network visualization options that retain the meaning of phrases by maintaining word order, rather than just displaying individual words.
Availability and Requirements
Project name: WordCloud
Project home page: http://baderlab.org/Software/WordCloudPlugin
Operating system: Platform independent
Programming language: Java
Other requirements: Cytoscape version 2.6 or newer, Java SE 5
License: GNU LGPL
Any restrictions to use by non-academics: None
We thank Maital Ashkenazi and Hannah Tipney for their useful comments. We thank the developers of Cytoscape for enabling development of this plugin. WordCloud development was supported by the Google Summer of Code program (to LO) and by a grant from the US NIH via National Human Genome Research Institute (NHGRI) grant P41 P41HG04118 (to GDB).
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMed CentralView ArticlePubMedGoogle Scholar
- Merico D, Gfeller D, Bader GD: How to visually interpret biological data using networks. Nat Biotechnol. 2009, 27: 921-924. 10.1038/nbt.1567.PubMed CentralView ArticlePubMedGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MHY, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O'Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-643. 10.1038/nature04670.View ArticlePubMedGoogle Scholar
- Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-29. 10.1038/75556.View ArticleGoogle Scholar
- Isserlin R, Merico D, Alikhani-Koupaei R, Gramolini A, Bader GD, Emili A: Pathway Analysis of Dilated Cardiomyopathy using Global Proteomic Profiling and Enrichment Maps. Proteomics. 2010, 10: 1316-1327. 10.1002/pmic.200900412.PubMed CentralView ArticlePubMedGoogle Scholar
- Hammond T, Hannay T, Lund B, Scott J: Social bookmarking tools (I): A general review. D-Lib Magazine. 2005, 11 (4): 10.1045/april2005-hammond.
- Kuo BYL, Hentrich T, Good BM, Wilkinson MD: Tag clouds for summarizing web search results. Proceedings of the 16th International Conference on World Wide Web. 2007, Banff, Alberta, CanadaGoogle Scholar
- Begelman G, Keller P, Smadja F: Automated Tag Clustering: Improving search and exploration in the tag space. Proceedings of the 15th International Conference on World Wide Web. 2006, Edinburgh, UKGoogle Scholar
- Hassan-Montero Y, Herrero-Solana V: Improving tag-clouds as visual information retrieval interfaces. International Conference on Multidisciplinary Information Sciences and Technologies. 2006, Merida, SpainGoogle Scholar
- Nam D, Kim SY: Gene-set approach for expression pattern analysis. Briefings in Bioinformatics. 2008, 9: 189-197. 10.1093/bib/bbn001.View ArticlePubMedGoogle Scholar
- Merico D, Isserlin R, Stueker O, Emili A, Bader GD: Enrichment Map A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation. PloS ONE. 2010, 5 (11): 10.1371/journal.pone.0013984.
- Sartor MA, Mahavisno V, Keshamouni VG, Cavalcoli J, Wright Z, Karnovsky A, Kuick R, Jagadish HV, Mirel B, Weymouth T, Athey B, Omenn GS: ConceptGen a gene set enrichment and gene set relation mapping tool. Bioinformatics. 2010, 26: 456-463. 10.1093/bioinformatics/btp683.PubMed CentralView ArticlePubMedGoogle Scholar
- Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman WH, Pages F, Trajanoski Z, Galon J: ClueGo: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009, 25: 1091-1093. 10.1093/bioinformatics/btp101.PubMed CentralView ArticlePubMedGoogle Scholar
- Porter MF: An algorithm for suffix stripping. Program: electronic library and information systems. 2006, 40: 211-218. 10.1108/00330330610681286.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.