Sig2BioPAX: Java tool for converting flat files to BioPAX Level 3 format
© Webb and Ma'ayan; licensee BioMed Central Ltd. 2011
Received: 10 December 2010
Accepted: 21 March 2011
Published: 21 March 2011
The World Wide Web plays a critical role in enabling molecular, cell, systems and computational biologists to exchange, search, visualize, integrate, and analyze experimental data. Such efforts can be further enhanced through the development of semantic web concepts. The semantic web idea is to enable machines to understand data through the development of protocol free data exchange formats such as Resource Description Framework (RDF) and the Web Ontology Language (OWL). These standards provide formal descriptors of objects, object properties and their relationships within a specific knowledge domain. However, the overhead of converting datasets typically stored in data tables such as Excel, text or PDF into RDF or OWL formats is not trivial for non-specialists and as such produces a barrier to seamless data exchange between researchers, databases and analysis tools. This problem is particularly of importance in the field of network systems biology where biochemical interactions between genes and their protein products are abstracted to networks.
For the purpose of converting biochemical interactions into the BioPAX format, which is the leading standard developed by the computational systems biology community, we developed an open-source command line tool that takes as input tabular data describing different types of molecular biochemical interactions. The tool converts such interactions into the BioPAX level 3 OWL format. We used the tool to convert several existing and new mammalian networks of protein interactions, signalling pathways, and transcriptional regulatory networks into BioPAX. Some of these networks were deposited into PathwayCommons, a repository for consolidating and organizing biochemical networks.
The software tool Sig2BioPAX is a resource that enables experimental and computational systems biologists to contribute their identified networks and pathways of molecular interactions for integration and reuse with the rest of the research community.
BioPAX is a protocol for the specification and representation of cell signaling pathways, gene-regulatory networks, protein-protein interactions and other types of biomolecular interaction data . There are several software tools that use the BioPAX format for pathway visualization and analysis for hypotheses generation. For example, the popular tool Cytoscape allows customizable visualization and easy navigation of different types of networks . Cytoscape plug-ins, including the popular BiNGO , and other plugins such as BiNoM , and cPath  further extend Cytoscape's capabilities for pathway analysis, data visualization, and data integration. BiNGO is a plugin that statistically analyzes a set of genes and their corresponding Gene Ontology functional annotations to determine which functional categories are overrepresented in that gene set. BiNGO uses Cytoscape's visualization capabilities to display the results. BiNoM is a plugin that performs structural analysis of networks, identifying strongly connected components, paths and cycles. cPath is an interaction database that can be included in Cytoscape as a plugin. The cPath database is a central repository for pathway and interaction datasets from multiple sources including MINT , IntAct , Reactome , and BioGRID. The plugin allows for data retrieval from the central cPath database via an XML Web Services API, using the Cytoscape visualization engine for viewing biochemical networks. Interaction data stored in cPath are in BioPAX format.
BioPAX is one of several specification protocols that have been developed in an attempt to formally characterize biochemical regulatory molecular interactions. Some of these other specifications include the Proteomics Standard Initiative Molecular Interactions format (PSI-MI)  and the Systems Biology Markup Language (SBML) . There are tools for conversion of some of these data formats into BioPAX. The previously mentioned Cytoscape plugin BiNoM also allows for conversion between BioPAX, SBML, and CellDesigner formats. However, most biochemical interaction data is not stored in one of these formats already, but rather stored in flat files, Excel spreadsheets, as network diagrams, or as tables in PDF format. While there are commercial products available to help researchers transform their flat text files into a general OWL format, to date, there are no tools available to transform flat files into BioPAX format. Such a tool would be useful because there are many pathway databases and networks that need to be converted for data sharing and reuse. Additionally, biologists that identify new interactions or describe new pathways in publications and do not have the technical expertise to convert their interaction data into BioPAX format will be able to do so with the tool.
A description of the reaction types processed by Sig2BioPAX
Molecules bind to form a complex
Kinase catalyzes addition of phosphate group to target molecule
Catalyst initiates removal of phosphate group from target molecule
Guanine Nucleotide Exchange
GDP removed from complex and replaced with GTP
GTPase activating protein
GTP bound to compound becomes GDP
Ubiquitin molecule is added to target compound
Ubiquitin molecule is removed from target compound
Small ubiquitin-related modifier is attached to target molecule
Cleavage with Phospholipase C
PLC cleaves PIP2 into IP3 and DAG
Cleavage on Cysteine
Deactivating cleavage on a cysteine residue of target protein
Deactivating cleavage of target molecule
Cleavage of pro-protein into active form
Otherwise unspecified reaction between two proteins
Protein activates or inhibits transcription of gene products
Transcription Factor Promoter Binding
Transcription factor and protein bind to create complex
A description of an input line using the default input template, sig, and the meaning of the individual elements on the input line.
sig template: SN SH SM ST SL TN TH TM TT TL E TI ID
Source compound name
Source Swiss-Prot human accession number
Source Swiss-Prot mouse accession number
Source Type of compound
Source cellular location
Target compound name
Target Swiss-Prot human accession number
Target Swiss-Prot mouse accession number
Target Type of compound
Target cellular location
Effect of source on target compound - Activating, inhibiting, or neutral
Type of interaction
Pubmed ID of reference article
We used Sig2BioPAX to convert several network datasets into the BioPAX format, and some of these networks are available on Pathway Commons , an international collaborative database of biomolecular pathways. The datasets we converted to BioPAX are original networks we extracted from the literature for the projects: the presynaptome , representing protein-protein interactions present in presynaptic nerve terminals of mammalian neurons, and the neuronal signalome , representing cell signaling interactions extracted from neuroscience literature describing combined cell signaling pathways in mammalian neurons, the adhesome , a network of interaction in focal adhesions; and a kinase-substrate network we constructed for the program KEA , kinase enrichment analysis, and ChEA , which stand for chip-seq/chip enrichment analysis.
As the bulk and complexity of genome-wide molecular data increases, methods for sharing and exchanging data need to be further developed. Effective standard representation of data enables seamless data exchange across platforms, tools and databases. However, converting existing and new data into such exchange formats is not trivial. The Sig2BioPAX tool will further enable researchers to easily convert their flat file interaction data into the computable BioPAX format so that their data can be reused and interpreted by other researchers worldwide.
Availability and Requirements
Project name : Sig2BioPAX
Project home page : http://code.google.com/p/sig2biopax/
Operating system(s) : Platform independent
Programming language : Java
Other requirements : Jena 2.6.2 or higher
License : GNU GPL
Any restrictions to use by non-academics : none
This research was supported by NIH Grants P50GM071558-01A27398, R01DK088541, R01GM054508, R01DA15446, and KL2RR029885-0109. We would like to thank Emek Demir from MSKCC for assistance with the BioPAX specification.
- Demir E, Cary MP, Paley S, Fukuda K, Lemer C, Vastrik I, Wu G, D'Eustachio P, Schaefer C, Luciano J, et al.: The BioPAX community standard for pathway data sharing. Nat Biotechnol 2010, 28 (9) : 935–942.PubMedView Article
- Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, et al.: Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2007, 2 (10) : 2366–2382.PubMedView Article
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks. Bioinformatics 21 (16) : 3448–3449.
- Zinovyev A, Viara E, Calzone L, Barillot E: BiNoM: a Cytoscape plugin for manipulating and analyzing biological networks. Bioinformatics 2008, 24 (6) : 876–877.PubMedView Article
- Cerami E, Bader G, Gross B, Sander C: cPath: open source software for collecting, storing, and querying biological pathways. BMC Bioinformatics 2006, 7 (1) : 497.PubMedView Article
- Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Letters 2002, 513 (1) : 135–140.PubMedView Article
- Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A, Derow C, Feuermann M, Ghanbarian AT, Kerrien S, Khadake J, et al.: The IntAct molecular interaction database in 2010. Nucleic Acids Research 2010, 38 (suppl 1) : D525-D531.PubMedView Article
- Vastrik I, D'Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D, de Bono B, Gillespie M, Jassal B, Lewis S, et al.: Reactome: a knowledge base of biologic pathways and processes. Genome Biology 2007, 8 (3) : R39.PubMedView Article
- Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, et al.: The BioGRID Interaction Database: 2011 update. Nucleic Acids Res 2010.
- Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C, et al.: The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data. Nat Biotechnol 2004, 22 (2) : 177–183.PubMedView Article
- Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 2003, 19 (4) : 524–531.PubMedView Article
- Jena – A Semantic Web Framework for Java [http://jena.sourceforge.net/]
- Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma'ayan A: ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 2010, 26 (19) : 2438–2444.PubMedView Article
- Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C: Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res 2010.
- Abul-Husn NS, Bushlin I, Morón JA, Jenkins SL, Dolios G, Wang R, Iyengar R, Ma'ayan A, Devi LA: Systems approach to explore components and interactions in the presynapse. PROTEOMICS 2009, 9 (12) : 3303–3315.PubMedView Article
- Ma'ayan A, Jenkins SL, Neves S, Hasseldine A, Grace E, Dubin-Thaler B, Eungdamrong NJ, Weng G, Ram PT, Rice JJ, et al.: Formation of regulatory patterns during signal propagation in a Mammalian cellular network. Science 2005, 309 (5737) : 1078–1083.PubMedView Article
- Zaidel-Bar R, Itzkovitz S, Ma'ayan A, Iyengar R, Geiger B: Functional atlas of the integrin adhesome. Nat Cell Biol 2007, 9 (8) : 858–867.PubMedView Article
- Lachmann A, Ma'ayan A: KEA: kinase enrichment analysis. Bioinformatics 2009, 25 (5) : 684–686.PubMedView Article
- Lachmann A, Xu H, Krishnan J, Berger SI, Mazloom AR, Ma'ayan A: ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26 (19) : 2438–2444.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.