 Software review
 Open access
 Published:
A proof of the DBRFMEGN method, an algorithm for deducing minimum equivalent gene networks
Source Code for Biology and Medicine volume 6, Article number: 12 (2011)
Abstract
Background
We previously developed the DBRFMEGN (d ifferenceb ased r egulation f indingm inimum e quivalent g ene n etwork) method, which deduces the most parsimonious signed directed graphs (SDGs) consistent with expression profiles of singlegene deletion mutants. However, until the present study, we have not presented the details of the method's algorithm or a proof of the algorithm.
Results
We describe in detail the algorithm of the DBRFMEGN method and prove that the algorithm deduces all of the exact solutions of the most parsimonious SDGs consistent with expression profiles of gene deletion mutants.
Conclusions
The DBRFMEGN method provides all of the exact solutions of the most parsimonious SDGs consistent with expression profiles of gene deletion mutants.
Background
Identification of gene regulatory networks (hereafter called gene networks) is essential for understanding cellular functions. Largescale gene deletion projects [1–4] and DNA microarrays [5, 6] have enabled the creation of largescale gene expression profiles of gene deletion mutants [7, 8]; these largescale profiles comprise the expression levels of thousands of genes measured in deletion mutants of those genes. Such profiles are invaluable sources for identifying gene networks. Many procedures have been developed for inferring gene networks from such profiles [9–18].
Kyoda et al. developed the DBRFMEGN (d ifferenceb ased r egulation f indingm inimum e quivalent g ene n etwork) method, an algorithm for inferring gene networks from largescale gene expression profiles of gene deletion mutants [14]. In this algorithm, gene networks are modeled as signed directed graphs (SDGs) in which a regulation between two genes is represented as a signed directed edge whose sign  positive or negative  represents whether the effect of the regulation is activation or inhibition and whose direction represents which gene regulates which other gene; the most parsimonious SDGs consistent with the expression profiles are thus deduced. Kyoda et al. showed that the method is applicable to largescale gene expression profiles of gene deletion mutants and that networks deduced by the method are valid and useful for predicting functions of genes [14]. However, details of the method's algorithm and a proof of the algorithm have not previously been published.
Here we describe in detail the algorithm of the DBRFMEGN method and prove that the algorithm provides all of the exact solutions of the most parsimonious gene networks consistent with expression profiles of gene deletion mutants.
Implementation
The software of the DBRFMEGN method was written in C++ under Linux. The complete source code files, a binary Linux executable file, and the software manual are available [see Additional File 1].
Results
Differencebased deduction of initially deduced edges and the minimum equivalent gene networks
The DBRFMEGN method consists of five processes, namely (1) differencebased deduction of initially deduced edges, (2) removal of nonessential edges from the initially deduced edges, (3) selection of the uncovered edges in main components from the nonessential edges, (4) separation of the uncovered edges in main components into independent groups, and (5) restoration of the minimum number of edges from each independent group [14]. First, we define a gene network modeled as an SDG:
Definition 1: A signed directed graph (SDG) is given by a tuple G = (V, E, f) with a set V of nodes (genes), a set E⊆V×V of directed edges, and an edge sign function f:E→{± 1}, which is an integral part of an SDG.
The first process of the DBRFMEGN method is "differencebased deduction of initially deduced edges" (Figure 1b), which uses an assumption that is commonly made in genetics and cell biology [14], i.e., there exists a positive (negative) regulation from gene A to gene B when the expression level of gene B in the deletion mutant of gene A is significantly lower (higher) than in the wildtype (Figure 1a). For each possible pair of genes in the profiles, the process determines whether positive (negative) regulations between those genes exist and deduces all edges consistent with both the assumption and the profiles by detecting the difference in expression levels between the wild type and deletion mutants; we call these edges initially deduced edge s.
Definition 2: Let us assume the intervention experimentshave been performed for the gene set J, J ⊆V. Let D = (d_{ jk })∈R^{J×V} be a matrix such thatd_{ jk } represents the expression of gene k after an intervention in gene j (relative to wildtype expression). From this, we deduce the graph initially deduced edges, G_{ ide } = (V,E_{ ide },f). We assume a negative regulation of k by j if d_{ jk } > α for some suitably chosen constant α. Analogously, a positive regulation of k by j is postulated whenever d_{ jk } < β for some β (sensibly, we require β < 0 < α). Formally,
and f:E_{ ide } →{± 1} is given by f((j,k)) = 1 if there is a positive regulation of k by j, and otherwise f((j,k)) = 1.
The thresholds α and β determine the significance of the difference in expression levels between the wild type and deletion mutants. These thresholds can be specified by various procedures such as by using foldchange or the statistical significance of the expression level [7, 8, 14, 19, 20].
The DBRFMEGN method deduces the most parsimonious SDGs consistent with the SDG that consists of the initially deduced edges. Before defining the most parsimonious SDGs, we need to introduce the function exp and the concept cover (Figure 2).
Definition 3: If, and only if, ∃ (i, j), (j, k), (i, k)  f(i, j) × f(j, k) = f(i, k), then exp(i, j, k) = 1; otherwise, exp(i, j, k) = 0.
Definition 4: Let E_{ p } ⊆ E_{ ide } be a set of edges. Define and by induction such that exp(j,i,k)=1}. Moreover, let .
Remark: The family of edge sets on V is partially ordered by set inclusion. If E_{1}⊆E_{2}, note that by a trivial induction on r, , and hence . This means that the mapping is monotonic. Let E ⊆ E_{ ide } . By construction, an edge (j, k) from is an element of for suitable r,s ∈ N. This implies . Thus , and the mapping is a socalled closure operation.
Lemma 1: If E_{1}⊆E_{2}, .
Proof: The remark proves lemma 1.
Lemma 2: If , then .
Proof: by monotonicity and closure of the mapping .^{cov}.
Lemma 3: If and , then .
Proof: By by monotonicity and closure of the mapping.^{cov}.
Now, we define the most parsimonious SDGs consistent with the expression profiles of gene deletion mutants. A most parsimonious SDG consists of the minimum number of edges that "cover" all initially deduced edges. By this definition, an edge can be redundant only when it is "explained" by two other initially deduced edges. Importantly, an edge is not redundant when it is "explained" by only three or more initially deduced edges (Figure 3a). We call the most parsimonious SDGs minimum equivalent gene networks (MEGNs).
Definition 5: (where is the restriction of f to E_{0}) is a most parsimonious SDG, named a MEGN, of G = (V,E_{ ide },f) if and only if it satisfies the following conditions: (1) E_{0}⊆E_{ ide } , (2) , (3) ∀ E_{ p } ⊆ E_{ ide } such that , . Since we keep G = (V,E_{ ide },f) fixed for the rest of the paper, we often call G_{0} simply a MEGN, without explicit reference to G.
Removal of nonessential edges from the initially deduced edges
The second process of the DBRFMEGN method removes all nonessential edges from the initially deduced edges. The process removes all edges that are explained by two other initially deduced edges (Figure 1c). The resulting edges are called essential edges and the removed edges are called nonessential edges.
Definition 6: If there exist (i, j), (j, k), (i, k) ∈ E_{ ide } such that exp(i, j, k) = 1, then (i, k) is called a nonessential edge. Let E_{ nes } be the set of nonessential edges. The set E_{ es } of essential edges is the complement of E_{ nes } in E_{ ide }, E_{ es } = E_{ ide } \E_{ nes } .
Essential edges and nonessential edges have the following properties.
Lemma 4: If E_{ p } ⊆ E_{ ide } and , then E_{ p }⊇ E_{ es }.
Proof: Assume that there exists (i, j) ∈E_{ es } such that (i, j) and (i, j) ∉ E_{ p } . Because (i, j) and (i, j) ∉ E_{ p } , there exist (i, k), (k, j) such that exp(i, k, j) = 1. This contradicts our assumption (i, j) ∉ E_{ es } .
Lemma 5: If is a MEGN, E_{ es } ⊆ E_{0}.
Proof:, hence E_{ 0 } ⊇ E_{ es } by lemma 4.
When the essential edges cover all initially deduced edges, the SDG consisting of the essential edges is the only MEGN consistent with the profiles.
Theorem 1: If , then is the unique MEGN of G = (V, E_{ ide }, f).
Proof: By hypothesis, conditions (1) E_{ es } ⊆ E_{ ide } , and (2) , of a MEGN are met. It remains to show the uniqueness and minimality of E_{ es } . (3) Let be an arbitrary MEGN. Then by lemma 5, E_{ es } ⊆ E_{0}, and by minimality of E_{0}, it follows that E_{ es } = E_{0}. The theorem is proved.
Selection of the uncovered edges in main components from the nonessential edges
The essential edges sometime fail to cover all initially deduced edges because some edges in the initially deduced edges represent direct gene regulations even when they are explained by two other edges (Figure 1d). In this case, the method restores the minimum number of nonessential edges so that the resulting edges (essential edges and the restored nonessential edges) cover all initially deduced edges. The SDG, consisting of essential edges and of the restored nonessential edges, is a MEGN. Before selecting the sets of nonessential edges to be restored, the method distinguishes nonessential edges that have a chance to be included in the MEGNs from those that do not in order to reduce the number of nonessential edges to be considered for the restoration and thus to reduce the computational cost to find nonessential edges to be restored. This third process of the DBRFMEGN method consists of two subprocesses, namely (a) selection of uncovered edges and (b) selection of uncovered edges in main components. The resulting nonessential edges are called uncovered edges in main components, and from these edges the later processes of the DBRFMEGN method select edges that are included in the MEGNs.
a) Selection of uncovered edges
The first subprocess distinguishes the nonessential edges that are covered by the essential edges from those that are not (Figure 1d). Those edges are called covered edges and uncovered edges, respectively.
Definition 7: Let E_{ cv } = (E_{ es } )^{cov}\ E_{ es } be the set of covered edges. Let E_{ ucv } = E_{ ide } \( E_{ es } ∪ E_{ cv } ) be the set of uncovered edges. The set of initially deduced edges is thereby partitioned into three disjoint edge sets: E_{ ide } = E_{ es } ∪ E_{ cv } ∪ E_{ ucv } .
Here, we prove that the MEGNs do not include covered edges.
Lemma 6: If is a MEGN, then E_{ es } ⊆ E_{ 0 } ⊆ E_{ es } ∪ E_{ ucv } .
Proof: First, E_{ es } ⊆ E_{ 0 } by lemma 5. By definition 7, E_{ es } ⊆ E_{ 0 }\E_{ cv } , hence by monotonicity of.^{cov}. It follows that by lemma 2. By minimality of E_{0}, E_{ 0 } = E_{ 0 }\ E_{ cv } , which is equivalent to E_{ 0 } ∩ E_{ cv } = Φ. By definition 7, this implies E_{ 0 } ⊆ E_{ es } ∪ E_{ ucv } , completing the proof.
b) Selection of uncovered edges in main components
The second subprocess distinguishes uncovered edges that have a chance to be included in the MEGNs from those that do not (Figure 1e; Figure 4). Those edges are called uncovered edges in main components and uncovered edges in peripheral components. The uncovered edges in peripheral components are defined as follows:
Definition 8: Define be the set of uncovered edges (i,j) ∈ E_{ ucv } which cannot be used to directly explain another uncovered edge in E_{ ucv } with the other edges (k,i) ∈ E_{ ide } or (j,k) ∈ E_{ ide } .
Lemma 7:.
Proof: By definition 8, the edges in cannot explain another uncovered edges in E_{ ucv } . Therefore, the edges in can be explained by the edges in . The lemma is proved.
Definition 9: Following the definition 8, define which cannot be used to directly explain another uncovered edge in with the other edges (k,i) ∈ E_{ ide } or (j,k) ∈ E_{ ide } }. Let be the set of uncovered edges in peripheral components. Let be the set of uncovered edges in main components. The set of initially deduced edges is thereby partitioned into four disjoint edge sets: .
In the following, we prove that the MEGNs do not include uncovered edges in peripheral components. First, we prove that uncovered edges in peripheral components have the following properties.
Lemma 8:.
Proof: We prove lemma 8 by mathematical induction. (1) By lemma 7, lemma 8 is true when r = 0. By definitions 8 and 9, , hence . By lemmas 2 and 7, . Thus, lemma 8 is true when r = 1. (2) Assume that lemma 8 is true when r = m. This means that we assume that (2a). By definition 9, (2b). Because and (2b), (2c). Because (2a), (2c) and lemma 3, (2d). Because (2b) and (2d), . Thus, lemma 8 is true when r = m +1, if it is true when r = m. By (1) and (2), lemma 8 is true.
Lemma 9:.
Proof: By lemma 8, . Because , lemma 9 is true.
Now we prove that the MEGNs do not include uncovered edges in peripheral components.
Lemma 10: If is a MEGN, .
Proof: Assume that there exists . Because of lemma 5 and definition 7, , hence by lemma 6. By the assumption and definition 8, , hence . By lemmas 2 and 9, . This contradicts our assumption that is a MEGN. Therefore, . By definition 9 and lemma 6, this implies , completing the proof.
Separation of the uncovered edges in main components into independent groups and restoration of the minimum number of edges from each independent group
The fourth process of the DBRFMEGN method separates uncovered edges in main components into "independent groups" so that edges to be restored can be deduced independently for each group (Figure 1f; Figure 5). For each group, the fifth process of the DBRFMEGN method deduces the minimum number of edges with which essential edges can cover all edges in the group. All sets of such edges are deduced for each group. The essential edges and any possible combination of these sets from each group generate a MEGN of the profiles (Figure 1g).
The independent groups are generated so that the edges in one group do not cover those in other groups.
Definition 10: Define be a set of an edge , and by induction such that exp(i, j, k) = 1 or such that exp(k, i, j) = 1 or such that exp(i, k, j) = 1 or such that exp(i, k, j) = 1}. Let be the set of edges in an independent group. Let , where is a set of an edge and by induction such that exp(i, j, k) = 1 or such that exp(k, i, j) = 1 or such that exp(i, k, j) = 1 or such that exp(i, k, j) = 1}. Then, .
The essential edges and a combination of sets of the minimum number of edges for each independent group generate a MEGN of the profiles.
Definition 11: Let be the set of edges in i th independent group that satisfies (1) , (2) , and (3) such that , .
We prove that the essential edges and a combination of sets of the minimum number of edges for each independent group generate a MEGN of the profiles as follows:
Lemma 11: If there exist (i, j) ∈ , (i, k), (k, j) ∈ E_{ ide } such that exp(i, k, j) = 1, then {(i, k), (k, j)} ∩ E_{ ucv } ⊆ .
Proof: By definition 10, lemma 11 is true.
Lemma 12:.
Proof: By definitions 7 and 11, . Because , . By lemmas 2 and 8, . Therefore, .
Theorem 2: is a MEGN.
Proof: (1) by the condition of theorem 2.(2) By lemma 12, . (3) By lemmas 4, 11 and definition 11, ∀ E_{ p } ⊆ E_{ ide } such that , . Because and lemma 2, ∀ E_{ p } ⊆ E_{ ide } such that , . The theorem is proved.
Remark: When there exist more than one solution of the minimum number of edges for independent groups, the SDGs each of which consists of the essential edges and a possible combination of sets of the solutions for each independent group are MEGNs because these SDGs must satisfy the conditions in definition 5.
Algorithms of the DBRFMEGN method
We are concerned with algorithms that are computationally efficient for deducing MEGNs from expression profiles of singlegene deletion mutants. We list these in a form easily translatable into a computer program.
(A1) Algorithm for deducing initially deduced edges
double d[n][n]: gene expression profiles
int t[n][n]
void dbrf()
int i, j;
for i = 1 to n do
for j = 1 to n do
if d[i][j] < β & i ≠ j then
t[i][j]: = +1;
else if d[i][j] > α & i ≠ j then
t[i][j]: = 1;
else
t[i][j]: = 0;
The matrix d[n][n] represents the gene expression profiles. Each entry d[i][j] represents the logratio of the expression of gene j in gene i deletion mutants to that in the wildtype. The nonzero entries of the resulting matrix t[n][n] represent the initially deduced edges. If an entry t[i][j] is +1 or1, it represents a positive or negative edge from gene i to gene j, respectively. The number of complete iterations is bounded by n^{2} .
(A2) Algorithm for distinguishing the essential edges from the nonessential edges
int t[n][n]: initially deduced edges
void ess_noness()
int i, j, k;
for j = 1 to n do
for i = 1 to n do
if t[i][j] ≠ 0 then
for k = 1 to n do
if t[j][k] ≠ 0 & t[i][k] ≠ 0 & t[i][k] = t[i][j]× t[j][k] then
check(t[i][k]);
The checked entries of the matrix t[n][n] represent nonessential edges. The unchecked nonzero entries of the resulted matrix t[n][n] represent essential edges. We created this algorithm by modifying Warshall's algorithm [21]. The number of complete iterations is bounded by n^{3} .
(A3.1) Algorithm for distinguishing uncovered edges from covered edges
int t[n][n]: initially deduced edges
int e[n][n]: essential edges
void covered_edge()
int i, j, k;
bool finished;
finished : = false;
while finished = false do
finished : = true;
for i = 1 to n do
for j = 1 to n do
if e[i][j] ≠ 0 then
for k = 1 to n do
if e[j][k] ≠ 0 & t[i][k] ≠ 0 & t[i][k] = e[i][j] × e[j][k] then
e[i][k]: = t[i][k];
check(e[i][k]);
finished : = false;
The checked entries of the matrix e[n][n] represent covered edges. The nonzero entries of the matrix t[n][n] that differ from the nonzero entries of the resulted matrix e[n][n] represent uncovered edges. This algorithm iterates over the while loop to find edges in E_{ nes } that can be covered by the essential edges. Thus, the number of iterations is bounded by .
(A3.2) Algorithm for finding uncovered edges in peripheral components
int t[n][n]: initially deduced edges
int u[n][n]: uncovered edges
void peripheral_uncovered()
int i, j, k;
bool flag, sflag;
flag : = false;
while flag = false do
flag : = true;
for j = 1 to n do
for i = 1 to n do
if u[i][j] ≠ 0 then
sflag : = false;
for k = 1 to n do
if t[j][k] ≠ 0 & u[i][k] ≠ 0 & t[i][k] = t[i][j] × t[j][k] then
sflag : = true;
if t[k][i] ≠ 0 & u[k][j] ≠ 0 & t[k][j] = t[k][i] × t[i][j] then
sflag : = true;
if sflag = false then
check(u[i][j]);
flag : = false;
rm _checked _edge(); // set all checked entries to 0
The entries of the resulted matrix u[n][n] that have been changed from +1 or 1 to 0 represent uncovered edges in peripheral components. The nonzero entries of the resulted matrix u[n][n] represent uncovered edges in main components. This algorithm iterates over the while loop to find edges in E_{ ucv } that are to be included in . Thus, the number of complete iterations is bounded by .
(A4.1) Algorithm for dividing uncovered edges in main components () into independent groups
int t[n][n]: initially deduced edges
int e[n][n]: uncovered edges in main components
ig indgrp : independent group
list < edge > el : edge list
list < ig > igl : independent group list
void independent_group()
int i, j;
for i = 1 to n do
for j = 1 to n do
if e[i][j] ≠0 then
el.clear();
el.append(e_{ ij } );
append _group(i, j);
indgrp.init();
indgrp.set_el(el); // store edge list el in indgrp
igl.append(indgrp); // indgrp : an independent group
void append_group(int i, int j)
int x;
for x = 1 to n do
if t[i][x] ≠0 & t[x][j] ≠0 then
if t[i][j] = t[i][x] × t[x][j] then
if e[i][x] ≠0 then
el.append(e_{ ix } );
e[i][x]: = 0;
append _group(i, x);
if e[x][j] ≠0 then
el.append(e_{ xj } );
e[x][j]: = 0;
append _group(x, j);
if t[x][i] ≠0 & t[x][j] ≠0 then
if t[x][j] = t[x][i] × t[i][j] then
if e[x][j] ≠0 then
el.append(e_{ xj } );
e[x][j]: = 0;
append _group(x, j);
if t[j][x] ≠0 & t[i][x] ≠0 then
if t[i][x] = t[i][j] × t[j][x] then
if e[i][x] ≠0 then
el.append(e_{ ix } );
e[i][x]: = 0;
append _group(i, x);
The number of complete iterations of independent _group() is bounded by n^{2}. The number of complete iterations of append _group(int, int) is bounded . Thus, the number of complete iterations is bounded by .
(A4.2) Algorithm for finding all sets of minimum number of edges to be restored in each independent group
int e[n][n]: essential edges and uncovered edges in peripheral components
ig indgrp : independent group
list < ig > igl : independent group list
list < edge > el, tmp _el : edge list
list < edge list > combi_el : combination of edge list
void find_min_ig()
int i, num _edge;
for i = 1 to igl.size() do
combi _el.clear(); el.clear();
indgrp ← igl.get _ig(i); // copy the i th independent group from igl
el ← indgrp.get _el();
for num _edge = 1 to el.size() do
add _edge(num _edge, 1);
if (combi _el.size() > 0) then
break;
set _min _combi _el(i, combi _el); // store combi _el in the i th independent group
void add _edge(int num _edge, int start)
int j;
if start + num _edge  1 > el.size() then
return;
for j = start to el.size() do
set _edge(j); // set the entry of e[n][n] corresponding to the j th edge
// in el to +1 or 1 according to the sign of the edge
check(el.get _edge(j)); // check the j th edge in el
if num _edge > 1 then
add _edge(num _edge  1, j + 1);
else
if (confirm() = true) then
tmp _el.clear();
set _tmp _el(); // append all checked edges to tmp _el
combi _el.append(tmp _el); // tmp _el : a set of the minimum number of
// edges to be restored
reset _edge(j); // set the entry of e[n][n] corresponding to the j th edge
// in el to 0
uncheck(el.get_edge(j)); // uncheck the j th edge in el
bool confirm(): when resulting edges e[n][n] can covered all edges in the group, return true.
The number of complete iterations is bounded by , where G is the number of independent groups, R_{ j } is the number of edges in the j th independent group, n_{ j } is the number of genes in the j th independent group, and m_{ j } is the number of edges to be restored in the j th independent group.
(A5) Algorithm for deducing all MEGNs by making all possible combinations of sets of the minimum number of edges for each independent group
int e[n][n]: essential edges
int megn : the number of MEGNs
list < ig > igl : independent group list
list < edge list > tmp _combi _el : combination of edge list
list < edge > tmp _el : edge list
void megn()
int i;
i : = 1; megn : = 0;
sub _megn(i);
if megn = 0 then
e[n][n]: MEGN // e[n][n] represents the MEGN when
void sub _megn(int i)
int x, y, count;
if i > igl.size() then
return;
tmp _combi _el ← get _min_combi _el(i); // copy combi_el of the i th independent
// group
for y = 1 to tmp _combi _el.size() do
tmp _el ← tmp _combi _el.get _el(y); // copy the y th edge list of tmp _combi _el
set _edges(tmp _el); // set the entries of e[n][n] corresponding to the edges in
// tmp _el to +1 or 1 according to the signs of the edges
if i = igl.size() then
megn++;
e[n][n]: MEGN // e[n][n] represents a MEGN when
else
sub _megn(i + 1)
reset _edges(tmp _el); // set the entries of e[n][n] corresponding to the edges in
// tmp _el to 0
The number of complete iterations is bounded by , where S_{ j } is the number of sets of minimum number of edges to be restored for the j th independent group.
Discussion
We have described in detail the algorithm of the DBRFMEGN method and have proved that the algorithm provides all of the exact solutions of the most parsimonious gene networks consistent with expression profiles of gene deletion mutants. The resulting gene networks, called MEGNs, are the most parsimonious SDGs consistent with an SDG that consists of the initially deduced edges. In graph theory, many algorithms have been developed for deducing the most parsimonious unsigned directed graphs consistent with a given unsigned directed graph; these graphs are called minimum equivalent graphs (MEGs) [22–25]. MEGN is not just an "SDG version" of MEG, as is explained below. Although both MEGN and MEG are the most parsimonious graphs of a given graph, the parsimoniousness of the graph is defined differently between these graphs. MEGN consists of the minimum number of edges that cover all edges of a given graph (initially deduced edges), whereas MEG consists of the minimum number of edges that retain the reachability of a given graph [22]. MEGNs use the cover instead of the reachability because a MEGN is a prediction of a gene network consisting only of direct gene regulations [14]. When positive regulations from gene A to gene B, from gene B to gene C, from gene C to gene D, and from gene A to gene D are detected and regulation from gene A to gene C is not detected, the regulation from gene A to gene D is likely to be a direct regulation instead of an indirect regulation as a result of the other three regulations (Figure 3a). The use of cover makes MEGNs include edges representing such likely direct regulations (Figure 3a). In contrast, the MEGs, using reachability, do not include those edges (Figure 3b). Therefore, the DBRFMEGN method, which deduces MEGNs, is fundamentally different from algorithms that deduce MEGs or algorithms for transitive reduction of SDG [16–18].
The selection of uncovered edges in main components (the third process) and the generation of independent groups (the fourth process) make the DBRFMEGN method applicable to largescale gene expression profiles. Without these processes, the computational cost for finding all sets of nonessential edges to be included in the MEGNs is where n is the number of genes and m is the number of nonessential edges to be included in a MEGN. This computation is impractical for largescale gene expression profiles because increases rapidly as or m increase. The selection of uncovered edges in main components reduces the computational cost to and the generation of independent groups further reduces it to , where t is the number of independent groups, n_{ j } is the number of genes in the j th independent group, and m_{ j } is the number of edges in the j th independent group to be included in a MEGN. and m_{ j } are usually far smaller than and m. Because of these reductions of the computational cost, the DBRFMEGN method successfully deduced MEGNs from sets of largescale gene expression profiles [14] [see Additional file 2, Table S1; Additional file 3]. Although there is no guarantee that the method will deduce MEGNs from any given expression profiles in an acceptable time, the method would most probably deduce MEGNs from most sets of expression profiles in an acceptable time.
Because MEGNs are deduced from initially deduced edges, the accuracy of MEGNs depends on that of initially deduced edges. The primary source for the inaccuracy in initially deduced edges is the noise of the expression profiles. Importantly, the number of falsepositive edges in MEGN depends more on that of falselydetected edges than that of falselymissed edges in initially deduced edges; the number of falsenegative edges in MEGN depends more on that of falselymissed edges than that of falselydetected edges in initially deduced edges [see Additional file 2, Table S2; Additional file 2, Figure S1]. These dependencies suggest the following guideline for the thresholds α and β (Definition 2): when the number of falsepositive edges is more important than that of falsenegative edges in MEGN, α (β) should be a little higher (lower) than the optimal value; in contrast, when the number of falsenegative edges is more important than that of falsepositive edges in MEGN, α (β) should be a little lower (higher) than the optimal value.
The DBRFMEGN method is applicable not only to gene expression profiles of deletion mutants but also to those of gene overexpressions and conditional knockdowns/knockouts [26–28]. We cannot obtain gene expression profiles of deletion mutants for essential genes. Thus, the method cannot deduce gene networks including essential genes when we use gene expression profiles of deletion mutants. A possible solution for this problem is to use the expression profiles of gene overexpressions or conditional knockdowns/knockouts. Applications of the DBRFMEGN method to those profiles will deduce gene regulations that cannot be deduced from gene expression profiles of gene deletion mutants.
A limitation of the DBRFMEGN method is its inability to deduce (1) selfregulation of genes, and (2) combinatorial gene regulations such as regulation in which the expression of gene A is downregulated only when both gene B and gene C are inactive. Selfregulation could be deduced by using chromatin immunoprecipitation [29]. Combinatorial gene regulations could be deduced by using the expression profiles of multiple gene deletion mutants [30]. Synthetic genetic arrays can systematically construct a collection of doublegene deletion mutants [31]. A combination of the DBRFMEGN method and the above techniques would provide more accurate information about gene networks.
When the DBRFMEGN method is applied to gene expression profiles measured by using DNA microarray, each of the deduced edges represents regulation of one gene's mRNA level by another gene's activity. Therefore, the deduced MEGNs do not include edges that represent posttranscriptional gene regulations although they play major roles in the cell. However, because the algorithm of the DBRFMEGN method is based on logic that is most commonly used in genetics and cell biology to infer gene networks from smallscale experiments, we can predict posttranscriptional modulators of transcriptional activity from those MEGNs. We predicted total 72 transcriptional regulators and 232 posttranscriptional modulators of 18 transcriptional regulators from the MEGNs deduced from a set of gene expression profiles for 265 Saccharomyces cerevisiae genes [14]. The DBRFMEGN method is applicable not only to gene expression profiles measured by using DNA microarray but also to those measured by using other technologies such as 2DPAGEMS [32] and protein chips [33]. MEGNs deduced from those nonDNA microarray expression profiles will include edges that represent posttranscriptional gene regulations in the cell.
Conclusions
We described in detail the processes of the DBRFMEGN method and proved that these processes provide all of the exact solutions of the most parsimonious gene networks consistent with the expression profiles of gene deletion mutants, which are called MEGNs. The DBRFMEGN method provides invaluable information for understanding cellular functions.
Availability and requirements
Project name: DBRFMEGN
Project home page: http://so.gsc.riken.jp/dbrfmegn
Operating system: Linux
Programming language: C++
Other requirements: None
Licence: GNU LGPL
Any restrictions to use by nonacademics: Licence required
Abbreviations
 DBRF:

differencebased regulation finding
 MEGN:

minimum equivalent gene network
 SDG:

signed directed graph
 MEG:

minimum equivalent graph.
References
Liu LX, Spoerke JM, Mulligan EL, Chen J, Reardon B, Westlund B, Sun L, Abel K, Armstrong B, Hardiman G, King J, McCague L, Basson M, Clover R, Johnson CD: Highthroughput isolation of Caenorhabditis elegans deletion mutants. Genome Res. 1999, 9: 859867. 10.1101/gr.9.9.859.
Winzeler EA, Shoemaker DD, Astromoff A, Liang H, Anderson K, Andre B, Bangham R, Benito R, Boeke JD, Bussey H, Chu AM, Connelly C, Davis K, Dietrich F, Dow SW, Bakkoury ME, Foury F, Friend SH, Gentalen E, Giaever G, Hegemann JH, Jones T, Laub M, Liao H, Liebundguth N, Lockhart DJ, LucauDanila A, Lussier M, M'Rabet N, Menard P, Mittmann M, Pai C, Rebischung C, Revuelta JL, Riles L, Roberts CJ, RossMacDonald P, Scherens B, Snyder M, SookhaiMahadeo S, Storms RK, Véronneau S, Voet M, Volckaert G, Ward TR, Wysocki R, Yen GS, Yu K, Zimmermann K, Philippsen P, Johnston M, Davis RW: Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science. 1999, 285: 901906. 10.1126/science.285.5429.901.
Hamer L, Adachi K, MontenegroChamorro MV, Tanzer MM, Mahanty SK, Lo C, Tarpey RW, Skalchunes AR, Heiniger RW, Frank SA, Darveaux BA, Lampe DJ, Slater TM, Ramamurthy L, DeZwaan TM, Nelson GH, Shuster JR, Woessner J, Hamer JE: Gene discovery and gene function assignment in filamentous fungi. Proc Natl Acad Sci. 2001, 98: 51105115. 10.1073/pnas.091094198.
Giaever G, Chu AM, Connelly C, Riles L, Véronneau S, Dow S, LucauDanila A, Anderson K, André B, Arkin AP, Astromoff A, Bakkoury ME, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K, Deutschbauer A, Entian KD, Flaherty P, Foury F, Garfinkel DJ, Gerstein M, Gotte D, Güldener U, Hegemann JH, Hempel S, Herman Z, Jaramillo DF, Kelly DE, Kelly SL, Kötter P, LaBonte D, Lamb DC, Lan N, Liang H, Liao H, Liu L, Luo C, Lussier M, Mao R, Menard P, Ooi SL, Revuelta JL, Roberts CJ, Rose M, RossMacdonald P, Scherens B, Schimmack G, Shafer B, Shoemaker DD, SookhaiMahadeo S, Storms RK, Strathern JN, Valle G, Voet M, Volckaert G, Wang CY, Ward TR, Wilhelmy J, Winzeler EA, Yang Y, Yen G, Youngman B, Yu K, Bussey H, Boeke JD, Snyder M, Philippsen P, Davis RW, Johnston M: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387391. 10.1038/nature00935.
Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995, 270: 467470. 10.1126/science.270.5235.467.
Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to highdensity oligonucleotide arrays. Nat Biotechnol. 1996, 14: 16751680. 10.1038/nbt12961675.
Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109126. 10.1016/S00928674(00)000155.
Hu Z, Killion PJ, Iyer VR: Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet. 2007, 39: 683687. 10.1038/ng2012.
Ideker TE, Thorsson V, Karp RM: Discovery of regulatory interactions through perturbation: inference and experimental design. Pac Symp Biocomput. 2000, 5: 305316.
Kyoda KM, Morohashi M, Onami S, Kitano H: A gene network inference method from continuousvalue gene expression data of wildtype and mutants. Genome Inform Ser Workshop Genome Inform. 2000, 11: 196204.
Pe'er D, Regev A, Elidan G, Friedman N: Inferring subnetworks from perturbed expression profiles. Bioinformatics. 2001, Suppl 17: S215S224.
Wagner A: How to reconstruct a large genetic network from n gene perturbations in fewer than n^{2} easy steps. Bioinformatics. 2001, 17: 11831197. 10.1093/bioinformatics/17.12.1183.
Rung J, Schlitt T, Brazma A, Freivalds K, Vilo J: Building and analysing genomewide gene disruption networks. Bioinformatics. 2002, Suppl 18: S202S210.
Kyoda K, Baba K, Onami S, Kitano H: DBRFMEGN method: an algorithm for deducing minimum equivalent gene networks from largescale gene expression profiles of gene deletion mutants. Bioinformatics. 2004, 20: 26652675.
Bonneau R, Reiss D, Shannon P, Facciotti M, Hood L, Baliga NS, Thorsson V: The Inferelator: an algorithm for learning parsimonious regulatory networks from systemsbiology data sets de novo. Genome Biol. 2006, 7: R3610.1186/gb200675r36.
Albert R, DasGupta B, Dondi R, Kachalo S, Sontag E, Zelikovsky A, Westbrooks K: A novel method for signal transduction network inference from indirect experimental evidence. J Comput Biol. 2007, 14: 927949. 10.1089/cmb.2007.0015.
Tresch A, Beissbarth T, Sultmann H, Kuner R, Poustka A, Buness A: Discrimination of direct and indirect interactions in a network of regulatory effects. J Comput Biol. 2007, 14: 12171228. 10.1089/cmb.2007.0085.
Klamt S, Flassig RJ, Sundmacher K: TRANSWESD: inferring cellular networks with transitive reduction. Bioinformatics. 2010, 17: 21602168.
Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW: Parallel human genome analysis: microarraybased expression monitoring of 1000 genes. Proc Natl Acad Sci. 1996, 93: 1061410619. 10.1073/pnas.93.20.10614.
DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997, 278: 680686. 10.1126/science.278.5338.680.
Warshall S: A theorem on Boolean matrices. J Assoc Comput Mach. 1962, 9: 1112.
Moyles DM, Thompson GL: An algorithm for finding a minimum equivalent graph of a digraph. J Assoc Comput Mach. 1969, 16: 455460.
Hsu H: An algorithm for finding a minimal equivalent graph of a digraph. J Assoc Comput Mach. 1975, 22: 1116.
Martello S: An algorithm for finding a minimal equivalent graph of a strongly connected digraph. Computing. 1979, 21: 183194. 10.1007/BF02253052.
Martello S, Toth P: Finding a minimum equivalent graph of a digraph. Networks. 1982, 12: 89100. 10.1002/net.3230120202.
Mnaimneh S, Davierwala AP, Haynes J, Moffat J, Peng WT, Zhang W, Yang X, Pootoolal J, Chua G, Lopez A, Trochesset M, Morse D, Krogan NJ, Hiley SL, Li Z, Morris Q, Grigull J, Mitsakakis N, Roberts CJ, Greenblatt JF, Boone C, Kaiser CA, Andrews BJ, Hughes TR: Exploration of essential gene functions via tetratable promoter alleles. Cell. 2004, 118: 3144. 10.1016/j.cell.2004.06.013.
Sopko R, Huang D, Preston N, Chua G, Papp B, Kafadar K, Snyder M, Oliver SG, Cyert M, Hughes TR, Boone C, Andrews B: Mapping pathways and phenotypes by systematic gene overexpression. Mol Cell. 2006, 21: 319330. 10.1016/j.molcel.2005.12.011.
Chua G, Morris QD, Sopko R, Robinson MD, Ryan O, Chan ET, Frey BJ, Andrews BJ, Boone C, Hughes TR: Identifying transcription factor functions and targets by phenotypic activation. Proc Natl Acad Sci. 2006, 103: 1204512050. 10.1073/pnas.0605140103.
Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science. 2002, 298: 799804. 10.1126/science.1075090.
Tringe SG, Wagner A, Ruby SW: Enriching for direct regulatory targets in perturbed geneexpression profiles. Genome Biol. 2004, 5: R2910.1186/gb200454r29.
Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Pagé N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, Andrews B, Tyers M, Boone C: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001, 294: 23642368. 10.1126/science.1065810.
Gygi SP, Rochon Y, Franza BR, Aebersold R: Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 1999, 19: 17201730.
Zhu H, Klemic JF, Chang S, Bertone P, Casamayor A, Klemic KG, Smith D, Gerstein M, Reed MA, Snyder M: Analysis of yeast protein kinases using protein chips. Nat Genet. 2000, 26: 283289. 10.1038/81576.
Acknowledgements
We thank the anonymous reviewer for his/her thorough review and constructive suggestions, which significantly contributed to improving the readability of definitions and lemmas. We thank K. Oka for his support and useful discussions. We also thank S. Hamahashi and A. Kimura for critical comments on this manuscript. This work was supported in part by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Rice Genome Project SY1106) to HK and SO; by Special Coordination Funds for Promoting Science and Technology (to HK and SO), and by a GrantinAid for Scientific Research on Priority Areas "Systems Genomics" (to KK and SO) from the Ministry of Education, Culture, Sports, Science and Technology of Japan; and by a grant from the Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Agency to SO.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
KK participated in this study's conception, designed the algorithm, proved the algorithm, and drafted the manuscript. KB implemented the algorithm. HK participated in algorithm design. SO conceived this study, designed the algorithm, proved the algorithm, and drafted the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
13029_2010_65_MOESM2_ESM.DOC
Additional file 2: Supporting text for the applicability of the DBRFMEGN method to the largescale expression profiles and the sensitivity of the DBRFMEGN method to the noise of the expression profiles. (DOC 91 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Kyoda, K., Baba, K., Kitano, H. et al. A proof of the DBRFMEGN method, an algorithm for deducing minimum equivalent gene networks. Source Code Biol Med 6, 12 (2011). https://doi.org/10.1186/17510473612
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/17510473612