A web server for interactive and zoomable Chaos Game Representation images
© Arakawa et al; licensee BioMed Central Ltd. 2009
Received: 5 August 2009
Accepted: 17 September 2009
Published: 17 September 2009
Chaos Game Representation (CGR) is a generalized scale-independent Markov transition table, which is useful for the visualization and comparative study of genomic signature, or for the study of characteristic sequence motifs. However, in order to fully utilize the scale-independent properties of CGR, it should be accessible through scale-independent user interface instead of static images. Here we describe a web server and Perl library for generating zoomable CGR images utilizing Google Maps API, which is also easily searchable for specific motifs. The web server is freely accessible at http://www.g-language.org/wiki/cgr/, and the Perl library as well as the source code is distributed with the G-language Genome Analysis Environment under GNU General Public License.
Genomic sequences exhibit characteristic nucleotide compositional bias, especially in the relative abundances of short oligonucleotides. While diverse dinucleotide frequencies are observed among various phyla, closely related species tend to display similar compositions . Through these studies, the relative abundances of dinucleotides are considered to be the "genomic signature" [2, 3]. Chaos Game Representation (CGR) was first proposed by Jeffrey as a scale-independent means to visualize this non-randomness of genomic sequences, by applying the concept of chaotic dynamical systems . Further studies by Almeida et al. has shown that CGR is a generalized Markov chain probability table which can accommodate non-integer orders, and that CGR is advantageous over Markov transition tables for its computational efficiency and scale-independence [5–8].
Several software tools, including a database of CGR images , a web server , and a tool in the EMBOSS package , are already available for CGR analysis; however, these tools produce static images, which limits the full utility of CGR as scale-independent Markov transition table. Zoomable User Interface (ZUI) is effective in representing such scalable information , as exemplified by the popularity of Google Maps  in representing the geographical data. Therefore, here we describe a web server for generating interactive and zoomable CGR images, using Google Maps API  and Web 2.0 technologies .
Chaos Game Representation
where X o = (0, 0), V(A) = (-1, 1), V(C) = (-1, -1), V(G) = (1, 1), V(T) = (1, -1).
To generate k- mer table (or FCGR: frequency matrices extracted from CGR, as defined by Almeida et al. ), a square is repeatedly subdivided into four squares, while retaining the quadrant representation of four nucleotides, where A is upper left, C is lower left, G is upper right, and T is lower right. For example, for the tetramer "ACGT", upper left square representing A is subdivided, and then lower left square within this upper left square representing "AC" is subdivided, and so on (see Figure 1b for details). Repeating this process for all k-mer while color-coding the pixels with the abundance of corresponding k-mer from white (rare) to black (frequent) results in the k-mer table with 4k pixels of width 2k.
Mathematically, the position of a certain oligomer can be calculated by converting the nucleotide sequence into two binary bit sequences corresponding to the horizontal and vertical coordinates. C and T bases move to the lower quadrants, and G and T bases move to the right quadrants. Therefore, by substituting A and G with 0 and C and T with 1, a binary number corresponding to the y-distance from top-left corner pixel of the image is obtained. Similarly, x-distance is obtained by substituting A and C with 0 and G and T with 1. For example, distance of ACGT from the upper-left corner pixel is given by (0011, 0101) in binary, which is (3, 5) in decimal. Therefore, ACGT is located at the 4th column, 6th row of the 16 × 16 pixel image (Figure 1b).
Results and Discussion
Interactive and zoomable user interface
In order to fully utilize the CGR as scale-independent Markov transition table, and to be able to quickly locate oligomers in the k-mer table, the map can be searched for oligomers of any length from the search box located at the top (Figure 3). Search is incremental, and therefore corresponding position is immediately highlighted within the map upon typing the nucleotide sequence. Oligomers can be searched by specific sequences using only the four nucleotides (Figure 3a), or ambiguously using "n" to represent the all four nucleotides (Figure 3b). 4n regions are highlighted when multiple "n"s are used. With these zooming and interactive searching capabilities, CGR can be a powerful tool in studying the genomic signatures and overrepresented or underrepresented sequence motifs within the genome.
REST Web service API
Here the [genome] is a RefSeq accession number (see here for listing), and [method] is either cgr (for Chaos Game Representation) or kmer_table (for k-mer table). For example, for Mycoplasma genitalium genome (RefSeq: NC_000908) is:
Google Map view can be generated by appending "output = gmap" to the above URL, as follows:
In this way, all maps are generated on the fly, and are always up-to-date. Moreover, other web-pages or web-database sites can utilize our service to add CGR and k-mer table to their website, by simply referring to our URL. To use our service with user's own sequence, the sequence should be uploaded from http://rest.g-language.org/upload/ and use the reference ID given by the uploader in place of the accession number. For more details about the service, or the Perl API distributed with the latest G-language GAE package (version 1.8.9 or above) to use the software locally, see the documentations in our website .
List of abbreviations
Chaos Game Representation
Frequency matrices extracted from CGR
- G-language GAE:
G-language Genome Analysis Environment
Iterated Function System
Representational State Transfer
Zoomable User Interface.
This research is supported by the Grant-in-Aid for Young Scientists No.20710158 from the Japan Society for the Promotion of Science (JSPS), as well as funds from the Yamagata Prefectural Government and Tsuruoka City.
- Karlin S, Mrazek J, Campbell AM: Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol. 1997, 179 (12): 3899-3913.PubMed CentralPubMedGoogle Scholar
- Karlin S, Burge C: Dinucleotide relative abundance extremes: a genomic signature. Trends Genet. 1995, 11 (7): 283-290. 10.1016/S0168-9525(00)89076-9.View ArticlePubMedGoogle Scholar
- Karlin S, Campbell AM, Mrazek J: Comparative DNA analysis across diverse genomes. Annu Rev Genet. 1998, 32: 185-225. 10.1146/annurev.genet.32.1.185.View ArticlePubMedGoogle Scholar
- Jeffrey HJ: Chaos game representation of gene structure. Nucleic Acids Res. 1990, 18 (8): 2163-2170. 10.1093/nar/18.8.2163.PubMed CentralView ArticlePubMedGoogle Scholar
- Almeida JS, Carrico JA, Maretzek A, Noble PA, Fletcher M: Analysis of genomic sequences by Chaos Game Representation. Bioinformatics. 2001, 17 (5): 429-437. 10.1093/bioinformatics/17.5.429.View ArticlePubMedGoogle Scholar
- Almeida JS, Vinga S: Universal sequence map (USM) of arbitrary discrete sequences. BMC Bioinformatics. 2002, 3: 6-10.1186/1471-2105-3-6.PubMed CentralView ArticlePubMedGoogle Scholar
- Almeida JS, Vinga S: Computing distribution of scale independent motifs in biological sequences. Algorithms Mol Biol. 2006, 1: 18-10.1186/1748-7188-1-18.PubMed CentralView ArticlePubMedGoogle Scholar
- Almeida JS, Vinga S: Biological sequences as pictures: a generic two dimensional solution for iterated maps. BMC Bioinformatics. 2009, 10: 100-10.1186/1471-2105-10-100.PubMed CentralView ArticlePubMedGoogle Scholar
- Bikandi J, San Millan R, Rementeria A, Garaizar J: In silico analysis of complete bacterial genomes: PCR, AFLP-PCR and endonuclease restriction. Bioinformatics. 2004, 20 (5): 798-799. 10.1093/bioinformatics/btg491.View ArticlePubMedGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.View ArticlePubMedGoogle Scholar
- Arakawa K, Tamaki S, Kono N, Kido N, Ikegami K, Ogawa R, Tomita M: Genome Projector: zoomable genome map with multiple views. BMC Bioinformatics. 2009, 10: 31-10.1186/1471-2105-10-31.PubMed CentralView ArticlePubMedGoogle Scholar
- Google Maps. [http://maps.google.com/]
- Google Maps API. [http://code.google.com/apis/maps/]
- Zhang Z, Cheung KH, Townsend JP: Bringing Web 2.0 to bioinformatics. Brief Bioinform. 2008, 10: 1-10. 10.1093/bib/bbn041.PubMed CentralView ArticlePubMedGoogle Scholar
- Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M: G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining. Bioinformatics. 2003, 19 (2): 305-306. 10.1093/bioinformatics/19.2.305.View ArticlePubMedGoogle Scholar
- Arakawa K, Suzuki H, Tomita M: Computational Genome Analysis Using The G-language System. Genes, Genomes and Genomics. 2008, 2 (1): 1-13.Google Scholar
- Arakawa K, Tomita M: G-language System as a platform for large-scale analysis of high-throughput omics data. Journal of Pesticide Science. 2006, 31 (3): 282-288. 10.1584/jpestics.31.282.View ArticleGoogle Scholar
- G-language REST Service. [http://rest.g-language.org/]
- CGR Web Server. [http://www.g-language.org/wiki/cgr/]
- Stockinger H, Attwood T, Chohan SN, Cote R, Cudre-Mauroux P, Falquet L, Fernandes P, Finn RD, Hupponen T, Korpelainen E, et al: Experience using web services for biological sequence analysis. Brief Bioinform. 2008, 9 (6): 493-505. 10.1093/bib/bbn029.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.