GEMBASSY: an EMBOSS associated software package for comprehensive genome analyses
© Itaya et al.; licensee BioMed Central Ltd. 2013
Received: 31 March 2013
Accepted: 28 August 2013
Published: 29 August 2013
The popular European Molecular Biology Open Software Suite (EMBOSS) currently contains over 400 tools used in various bioinformatics researches, equipped with sophisticated development frameworks for interoperability and tool discoverability as well as rich documentations and various user interfaces. In order to further strengthen EMBOSS in the fields of genomics, we here present a novel EMBOSS associated software (EMBASSY) package named GEMBASSY, which adds more than 50 analysis tools from the G-language Genome Analysis Environment and its Representational State Transfer (REST) and SOAP web services. GEMBASSY basically contains wrapper programs of G-language REST/SOAP web services to provide intuitive and easy access to various annotations within complete genome flatfiles, as well as tools for analyzing nucleic composition, calculating codon usage, and visualizing genomic information. For example, analysis methods such as for calculating distance between sequences by genomic signatures and for predicting gene expression levels from codon usage bias are effective in the interpretation of meta-genomic and meta-transcriptomic data. GEMBASSY tools can be used seamlessly with other EMBOSS tools and UNIX command line tools. The source code written in C is available from GitHub (https://github.com/celery-kotone/GEMBASSY/) and the distribution package is freely available from the GEMBASSY web site (http://www.g-language.org/gembassy/).
First released in the year 2000, the European Molecular Biology Open Software Suite (EMBOSS)  is a comprehensive package for sequence analyses consisting of over 400 tools and is one of the most popular bioinformatics software packages. EMBOSS is not merely a collection of software tools, but is equipped with rich documentation and development framework to achieve high level of software interoperability and discoverability based on the Ajax Command Definitions (ACD) metadata for the tools. EMBOSS, therefore, is an interoperable bioinformatics software platform which work seamlessly in concert with other UNIX command-line tools, and it can alternatively be accessed from graphical user interface JEMBOSS  or from web based interface EMBOSS Explorer . Third-party development using the EMBOSS platform is called the EMBOSS associated software (EMBASSY), and we have previously developed an EMBASSY package named the Keio Bioinformatics Web Service (KBWS) , which complements EMBOSS tools with access to 42 major bioinformatics web services such as NCBI BLAST and WebLogo. As a further expansion of EMBOSS, we hereby present a novel EMBASSY package designated GEMBASSY. This package adds over 50 tools for genome analysis and gene-centric sequence manipulation from genome flatfiles, implemented using methods from the G-language Genome Analysis Environment (G-language GAE) [5–7]. G-language GAE contains over 100 programs for genome analysis where most of which are implemented with published algorithms, and each of the programs are implemented with a variety of options and produces graphical output where available. Analysis programs included in the G-language GAE such as for the identification of conserved sequence motifs with information theory [8, 9], prediction of expression levels of genes from codon usage bias , visualization of GC skew  and prediction of replication origin and terminus [12, 13], are effective in comparative study of bacterial genomes.
Complete list of 53 tools implemented in GEMBASSY
Nucleic Codon Usage
Results and discussion
As exemplified in the workflow, GEMBASSY complements existing EMBOSS/EMBASSY tools for the manipulation of genome flatfiles and adds numerous analysis tools suited for genome-level studies available in the G-language GAE. By making the tools available as an EMBASSY package based on EMBOSS framework, the users can use the same documentation (tfm) and discovery tools (wossname) of EMBOSS, and can take advantage of the familiar user interface that they are accustomed to.
Ajax command definition
European Molecular Biology Open Software Suite
Predicted highly expressed
Representational state transfer
Web service description language.
This research was supported by funds from Yamagata Prefectural Government and Tsuruoka City, and by the KAKENHI Grant-in-Aid for Young Scientists (A), No.222681029.
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16: 276-277. 10.1016/S0168-9525(00)02024-2.View ArticlePubMedGoogle Scholar
- Carver T, Bleasby A: The design of Jemboss: a graphical user interface to EMBOSS. Bioinformatics. 2003, 19: 1837-1843. 10.1093/bioinformatics/btg251.View ArticlePubMedGoogle Scholar
- EMBOSS Explorer.http://embossgui.sourceforge.net/.
- Oshita K, Arakawa K, Tomita M: KBWS: an EMBOSS associated package for accessing bioinformatics web services. Source Code Biol Med. 2011, 6: 8-10.1186/1751-0473-6-8.PubMed CentralView ArticlePubMedGoogle Scholar
- Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M: G-language genome analysis environment: a workbench for nucleotide sequence data mining. Bioinformatics. 2003, 19: 305-306. 10.1093/bioinformatics/19.2.305.View ArticlePubMedGoogle Scholar
- Arakawa K, Tomita M: G-language system as a platform for large-scale analysis of high-throughput omics data. J Pestic Sci. 2006, 30: 282-288.View ArticleGoogle Scholar
- Arakawa K, Suzuki H, Tomita M: Computational genome analysis using the G-language system. Genes Genomes Genomics. 2008, 2: 1-13. 10.1007/978-3-540-73837-4_1.Google Scholar
- Schneider TD: Measuring molecular information. J Theor Biol. 1999, 201: 87-92. 10.1006/jtbi.1999.1012.View ArticlePubMedGoogle Scholar
- Schneider TD: Consensus sequence Zen. Applied bioinformatics. 2002, 1: 111-119.PubMed CentralPubMedGoogle Scholar
- Henry I, Sharp PM: Predicting gene expression level from codon usage bias. Mol Biol Evol. 2007, 24: 10-12.View ArticlePubMedGoogle Scholar
- Lobry JR: Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 1996, 13: 660-665. 10.1093/oxfordjournals.molbev.a025626.View ArticlePubMedGoogle Scholar
- Frank AC, Lobry JR: Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes. Bioinformatics. 2000, 16: 560-561. 10.1093/bioinformatics/16.6.560.View ArticlePubMedGoogle Scholar
- Arakawa K, Saito R, Tomita M: Noise-reduction filtering for accurate detection of replication termini in bacterial genomes. FEBS Lett. 2007, 581: 253-258. 10.1016/j.febslet.2006.12.021.View ArticlePubMedGoogle Scholar
- Arakawa K, Kido N, Oshita K, Tomita M: G-language genome analysis environment with REST and SOAP web service interfaces. Nucleic Acids Res. 2010, 38: W700-W705. 10.1093/nar/gkq315.PubMed CentralView ArticlePubMedGoogle Scholar
- Van Engelen RA, Galliva KA: The gSOAP Toolkit for Web Services and Peer-to-Peer Computing Networks. 2002, Berlin: In Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid: 21–24 May 2002, 128-135.Google Scholar
- Shine J, Dalgarno L: The 3′-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A. 1974, 71: 1342-1346. 10.1073/pnas.71.4.1342.PubMed CentralView ArticlePubMedGoogle Scholar
- Karlin S, Mrazek J: Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol. 2000, 182: 5238-5250. 10.1128/JB.182.18.5238-5250.2000.PubMed CentralView ArticlePubMedGoogle Scholar
- Maglott D, Ostell J, Pruitt KD, Tatusova T: Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2011, 39: D52-D57. 10.1093/nar/gkq1237.PubMed CentralView ArticlePubMedGoogle Scholar
- Schneider TD, Stephens RM: Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990, 18: 6097-6100. 10.1093/nar/18.20.6097.PubMed CentralView ArticlePubMedGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.