# A CellML simulation compiler and code generator using ODE solving schemes

- Florencio Rusty Punzalan
^{1}Email author,### Affiliated with

- Yoshiharu Yamashita
^{2},### Affiliated with

- Naoki Soejima
^{1},### Affiliated with

- Masanari Kawabata
^{1},### Affiliated with

- Takao Shimayoshi
^{3},### Affiliated with

- Hiroaki Kuwabara
^{2},### Affiliated with

- Yoshitoshi Kunieda
^{2}and### Affiliated with

- Akira Amano
^{1}### Affiliated with

**7**:11

**DOI: **10.1186/1751-0473-7-11

© Punzalan et al.; licensee BioMed Central Ltd. 2012

**Received: **28 March 2012

**Accepted: **31 July 2012

**Published: **19 October 2012

### Abstract

Models written in description languages such as CellML are becoming a popular solution to the handling of complex cellular physiological models in biological function simulations. However, in order to fully simulate a model, boundary conditions and ordinary differential equation (ODE) solving schemes have to be combined with it. Though boundary conditions can be described in CellML, it is difficult to explicitly specify ODE solving schemes using existing tools. In this study, we define an ODE solving scheme description language-based on XML and propose a code generation system for biological function simulations. In the proposed system, biological simulation programs using various ODE solving schemes can be easily generated. We designed a two-stage approach where the system generates the equation set associating the physiological model variable values at a certain time *t* with values at *t* + *Δt* in the first stage. The second stage generates the simulation code for the model. This approach enables the flexible construction of code generation modules that can support complex sets of formulas. We evaluate the relationship between models and their calculation accuracies by simulating complex biological models using various ODE solving schemes. Using the FHN model simulation, results showed good qualitative and quantitative correspondence with the theoretical predictions. Results for the Luo-Rudy 1991 model showed that only first order precision was achieved. In addition, running the generated code in parallel on a GPU made it possible to speed up the calculation time by a factor of 50. The CellML Compiler source code is available for download at http://sourceforge.net/projects/cellmlcompiler.

## Introduction

In recent years, the continued development in computer processing power paved the way for the increased use of biological function simulation. Computers have proven to be invaluable in analysing complex and nonintuitive biological models and biologists are turning to them to complement their experiments. Simulations enable the testing of experimentally unfeasible scenarios and can potentially reduce experimental costs. However, the number and complexity of physiological models has also grown with the increase in computing performance. This creates challenges in reproducing simulated behaviours of the published models and reuse of models by other researchers, hindering the dissemination of science and knowledge integration.

One way to address model complexity is to use markup language-based model descriptions. Some popular examples include CellML [1], SBML (Systems Biology Markup Language) [2] and insilicoML [3]. CellML is an open standard markup language capable of describing mathematical models of cellular functions. SBML is an open interchange machine-readable format for representing models of functions such as metabolism and cell signalling. Meanwhile, insilicoML describes mathematical models for biophysical objects and incorporates morphological information such as shape, angle and position. SED-ML (Simulation Experiment Description Language) [4] is another type of description language which can encode the information of simulation experiments. These markup languages allow researchers to take advantage of the vast amount of biological function models using a common set of easily readable and versatile description rules.

Biological and physiological function models are generally described by differential equations. A typical simulation of biological function models consists of three parts: a model equation, a boundary condition, and an ordinary differential equation (ODE) solver. Model equations and boundary conditions can be described using CellML, while ODE numerical solutions like Euler and Runge-Kutta methods are typically built into the simulation software. However, it is necessary to be flexible in using ODE schemes in order to strike a balance between computational stability and speed. In addition, those using special hardware environments such as massively parallel computer systems require dedicated proprietary software to support their numerical solution needs. Thus, description languages like CellML and dedicated simulation software are not suitable or practical for flexibly incorporating different ODE solving schemes.

To address the need for more flexibility in creating simulation software, we created Time Evolution Calculation Markup Language (TecML), a machine-readable format for encoding ODE numerical solutions. TecML is a description language based on the extensible markup language (XML). This description language is designed to specify and store the numerical methods that can be used for solving the ODEs in biological models. It also allows the assignment of boundary conditions into the simulation experiments. The following sections describe TecML and how it is integrated into the proposed code generation system, which automatically generates codes for biological simulations.

The target of this study is limited to the use of different ODE numerical solutions and their application to models described in CellML. We propose an algorithm that allows users to change the ODE solution and boundary conditions of the model according to the computational needs of their simulation. To verify the effectiveness of the proposed system, we generate executable codes for several CellML models using a number of ODE numerical solutions. The system can generate code in several programming languages and code that runs in both sequential and parallel computing environments. Simulations on GPU (Graphics Processing Units) were undertaken to show the effect of using parallel computing on processing time.

## Biological simulation code generation system

### Summary of simulation code generation system

where *r*, *x* and *y* are variables, *t* indicates time and *a, b, c, d* are constants. Removable algebraic expressions such as equation (1) are often used in biological models to improve the model’s readability.

*x*and

*y*. This involves finding the solution to the variable vector ξ using its derivatives, temporal variable vector ι, and initial conditions at time

*t*given by

where ξ,κ and ι are the vectors representing the variables and differential equations in the biological model.

*t*are assigned as the initial values with

*x*and

*y*are computed as functions of κ

_{1}and

*δ*(equations (10) and (11)) and given by

*t*to

*t*+

*δ*(equations (14) and (15)) with

The variable *r* of the equation set corresponds to ι while *x* and *y* are the variables in the vector ξ. The derivatives *dx*/*dt* and *dy*/*dt* are expressed by κ in equations (20), (21), (26) and (27).

In order to automatically incorporate the ODE numerical solution into the CellML model, we introduced an ODE solving scheme description language called Time Evolution Calculation Markup Language (TecML). TecML is an XML-based description language that can be used to configure ODE solutions and implement the algorithm described in equations (6) – (15). Another input of the system is the Relation Markup Language (RelML) file. The basic role of a RelML file is to relate the variables and variable types of the CellML file into their equivalent in the TecML file.

The simulation code generation system uses these inputs to generate the set of equations describing the time evolution of the variables in biological models. This step creates equations (16) – (31) in the FHN example. Once the model equations are applied with the ODE numerical solution method, the next stage involves the generation of the executable code that will do the actual calculations.

### Description language

The code generation system requires three inputs, namely, the CellML file, TecML file and RelML file. This section gives a short introduction of each and how it is used in the algorithm and consequently, in the FHN model example.

#### CellML model encoding standard

#### TecML: an ODE solving scheme description language

*diffvar*,

*derivativevar*,

*arithvar*,

*constvar*,

*timevar*, and

*deltatimevar*. The variables determined by a rate of change with respect to time (ξ) are referred to as differential variables (

*diffvar*) while their derivatives (κ) are called derivative variables (

*derivativevar*). The removable variables (ι) are the arithmetic variables (

*arithvar*) and variables that do not change in value (ζ) are the constants (

*constvar*). In addition, the time (

*t*) and time increment (

*δ*) are referred to as

*timevar*and

*deltatimevar*, respectively. TecML also divides the mathematical equations into two types; namely, differential (f()) and non-differential (g()) equations. Equations of type (

*diffequ*) are the derivatives of a function while (

*nondiffequ*) are the arithmetic functions. The information and example of a TecML file for the Modified Euler method are shown in Figure 3 and Figure 4, respectively.

**Variable and function types in TecML**

Type | Definition |
---|---|

| Differential variable (variable is a function of time) |

| Temporal variable (can be substituted with math equations) |

| Derivative of diffvar |

| Constants |

| Time variable |

| Variable denoting the change of time per step |

| Differential equation |

| Non-differential equation |

#### Relation Markup Language (RelML)

*x*and

*y*are defined as

*diffvar*and their respective derivatives

*dx*/

*dt*and

*dy*/

*dt*as

*derivativevar*. In addition,

*a, b, c,*and

*d*are

*constvar*type while

*r*is considered a temporal variable or

*arithvar*type. Equations (2) and (3) are the formulas for calculating the

*diffvar*so the functions are categorized as

*diffequ*while the arithmetic equation for

*r*in equation (1) is a

*nondiffequ*.

### Executable code generation for equation sets

*N*×

*M*number of cells. Aside from generating single CPU codes, it can also create codes suited for parallel computing that runs on a GPU machine.

**Input and output code types in the system**

## Methods: simulation code generation algorithm

### Cell model description

*t*is the scalar variable for time. These variable vectors can be expressed as sets with

*N*

_{ x },

*N*

_{ y },

*N*

_{ z }are the variable count for k,y, and z, respectively. Furthermore, f(x,y,

*t*,z) and g(x,y,

*t*,z) are function vectors with

*k*

_{ i }=

*f*

_{ i }(x,y,

*t*,z) and

*y*

_{ i }=

*g*

_{ i }(x,y,

*t*,z). These function vectors are defined by

### ODE solving scheme

*d*ξ/

*dt*= Φ(ξ,ι,

*t*,ζ) and ι = Γ(ξ,ι,

*t*,ζ). Inside a TecML file, the dependence between the differential variables ξ

_{ 0 }(time

*t*) and (time

*t*+

*δ*) is given by

**Information written in a TecML file**

Notation | Definition |
---|---|

| Input differential variable vector |

| Output differential variable vector |

ξ
| Time differential variable vector |

| Differential variable vector of ξ |

| Current time value |

| Time step |

| Constants vector |

| Calculation of variable |

| Differential equation vector |

| Temporal function vector |

κ
| Derivative variable vector |

ι
| Temporal variable derived from ξ |

| Derivative vector of κ |

| Relation between |

### CellML and TecML integration

*d*x/

*dt*and the temporal variable ι corresponds to y and given by

*t*,ζ) and ι = Γ(ξ,ι,

*t*,ζ) correspond to the CellML equations f and g, respectively, as shown by

### Replacement algorithm

*diffvar*,

*derivativevar*,

*arithvar*,

*constvar*,

*timevar*, or

*deltatimevar*and that variable is replaced with the corresponding TecML variable name. This replacement procedure is represented in Strachey brackets and expressed by the following function:

*equ*⟧ is the TecML equation and, for all

*i*= 1,2,…,

*N*

_{ ξ }, the replacement functions are

Note that the Strachey brackets in
means that all the TecML *diffvar* (*i.e.* for *i* = 1…*N*
_{
ξ
}) in the argument is replaced with the corresponding CellML *diffvar*
x
_{
i
}. This function is true for all the other variable types in TecML.

*t*to

*t*+

*δ*. Figure 7 shows how the algorithm generates the simulation program from the input files. Note that the subroutine replace_v() in the algorithm replaces variables in the CellML model equation with the TecML variables as expressed in equation (50). The function replace_sj() generates one scalar equation from a vector equation and appends index j to the variables. In the equations which do not contain functions

*f*or

*g*, replace_d() generates multiple scalar equations from a vector equation. Finally, replace_f() and replace_g() unfold functions

*f*and

*g*, respectively. These subroutines can be expressed in the following transformations:

## Experiments and results

### Code generation for the cell model simulation

The FHN cell model [5] was used to evaluate the proposed system. The system was also used to generate simulation codes for a number of cell models (LuoRudy1991 [8], LuoRudy1994 [10] and KyotoModel2003 [9]) and with other more accurate ODE numerical methods (e.g. 4th-order Runge-Kutta method). All the generated code used in the simulation experiments can be downloaded from the files section of the program generator site [7].

The traditional approach to creating simulation experiment codes also offers little flexibility once the software is created. Changing the numerical ODE solution method means making major revisions in the simulation code. The proposed method allows the flexibility of using different simulation models, boundary conditions, and numerical methods for a simulation experiment. Since the simulation codes are generated automatically, users can choose or change their desired ODE solver without making changes in the simulation codes themselves. Also, this can give clear information on what is calculated to generate the simulation results.

**The number of execution steps for the generated codes for different cell models and ODE numerical methods**

Cell Model | Euler | Modified Euler | Runge-Kutta |
---|---|---|---|

FHN | 11 | 16 | 26 |

LuoRudy1991 | 70 | 117 | 211 |

LuoRudy1994 | 123 | 211 | 387 |

Kyoto Model | 335 | 560 | 1200 |

An issue in approximating ODE solutions is the accuracy of the numerical method used. A good approximation to the underlying differential equation needs to be achieved in order to arrive at accurate simulation results. We tested a number of commonly used ODE numerical methods to determine how the use of different solutions affect the accuracy of the calculations. The three methods used were Euler, Modified Euler and 4th-order Runge-Kutta. Each of these methods were used to generate an FHN model simulation code that runs in a single CPU. The generated code can run in different compilers and does not require third party software.

*double*. In the simulation codes, we used BigDecimal to represent the numbers in a 32-digit decimal point precision format for all the ODE solving methods. Different time steps were also used in testing the accuracy of these ODE methods, ranging from 10

^{−1}ms to 10

^{−5}ms. The simulation using a time step of 10

^{−6}and Runge-Kutta as ODE solving scheme was used as the basis to compute for the root-mean-square error (RMSE) and evaluate the accuracy of the other calculations. The RMSE was computed for all the calculations using different ODE numerical methods and in varying time steps (Figure 8). Each level in the RMSE indicates a 1/10 less accuracy in the simulation results compared to the calculations using Runge-Kutta and 10

^{−6}ms time step.

The first-order Euler method gave the largest error (log error ≈10^{−3}) while the Modified Euler resulted to a smaller error. The fourth-order Runge-Kutta method resulted with the best accuracy (log error ≈10^{−10}). However, the Runge-Kutta error is almost constant from the time step 10^{−3} ms, . This can be attributed to the rounding error of digits over the used 32-digit precision.

Simulation codes using different ODE numerical methods were also generated for models more complicated than the FHN or Hodgkin-Huxley model. For this, we used the model introduced by C. Luo and Y. Rudy in 1991 [8]. It is a simple cell model of cardiac action potential that uses Hodgkin-Huxley type equations to calculate ionic currents. The RMSE computations undertaken for the FHN model were also applied for the Luo-Rudy 1991 simulations. Codes were generated for the three ODE solutions with time steps ranging from 10^{−1} to 10^{−4} ms. Meanwhile, the simulation using the Euler method with a time step of 10^{−5} ms was used as the reference for RMSE calculations (Figure 9).

### Excitation propagation simulations

One possible application of 1- or 2-dimensional excitation propagation simulation is cardiac arrhythmia research. Computational models have provided new insights into the underlying mechanisms of re-entry like the role of ionic currents and ion channel mutations [11]. In addition, designs of defibrillation treatments can be optimized using simulations of excitation propagation to achieve good clinical results for patients.

Simulation codes for the Luo-Rudy 1991 model were generated to simulate cell excitation propagation on a 2D homogeneous sheet. The two-dimensional cell action potential propagation simulations were run for both CPU and GPU to compare the speed of calculations. The CPU simulation was performed with an intel Core i7 880 processor with 8 GB of memory running Windows 7. The GPU has a single Nvidia Tesla C2050 processor with 448 CUDA cores, and 3 GB memory in a CentOS 5.5 system. The programs are written in C for the CPU simulations and Cuda C for the GPU.

## Discussion

Traditionally, biological function simulation programs have to be manually created from scratch. This is manageable with small models or simulation experiments but it becomes less practical once model complexity increases. The more complex the model becomes the larger the program becomes and the harder it is to create and maintain such programs. The Doty model [12] showed that the cost of software development is proportional to the square exponential of the number of computational steps. As seen from Table 4, the number of computational steps increases with the complexity of the model. By using this system, the creation of a simulation program from biological models becomes simpler. The system takes input from experts in biological functions through CellML models and mathematical experts through ODE numerical solutions and automatically generates the simulation code. The automatic code generation allows it to deal with large and complex models without the proportional increase in software development cost. This also keeps the software maintenance cost low since the programs are created with the same structure, regardless of the biological model under study.

The system’s support for multiple programming languages makes it easier for users to test the model in different programming environments and keep the cost of switching from one programming language to a low. Based on the codes generated for the purpose of this paper, science and engineering students typically create a Java simulation code from the C version within four days.

The system requires the formatted RelML file to do the first stage of the algorithm and the RelML information can also be used to automatically determine the boundary conditions. The RelML information can be extracted automatically from the CellML model file but boundary conditions are not always available. We need a mathematical representation of boundary conditions which are compatible with declarative description. This can be addressed by the inclusion of PEPML experimental protocol [13] in the system. PEPML is an XML-based language that can describe the initial conditions and procedures of an experimental protocol. It describes boundary conditions in a mathematical form and is purely represented in a declarative manner, making it easy to combine with the current implementation of our system.

Although the system is fully implemented, further improvements are still to be added. The generated excitation propagation code does not consider the optimization of the model variables for parallel processing. Therefore, code parallelization is low and may have contributed to slower execution in the GPU environment. Also, the system can only handle equations with a single term in the left-hand side. Other equation forms will be addressed in future implementations.

Furthermore, simultaneous calculations using differential algebraic equations (DAE) are still not available. Simultaneous equations appear in some models and implicit numerical methods. The handling of implicit methods is available in the inputs but not in the current implementation of our system. A simultaneous equation library, other ODE numerical solutions library like CVODE, or a linear algebra library like LINPACK could be incorporated into our system. This can be a possibility with our plan to design a library interface description language to handle numerical solution inputs.

Future implementations also need to address the need for a declarative design of describing procedural methods. Complex ODE numerical methods like the ones described in the Kinetic Simulation Algorithm Ontology (KiSAO) [14] are difficult or impossible to encode in the current version of TecML. We are in the process of redesigning TecML and the system to incorporate methods that are suitable for the types of problems scientists are addressing with manually-created code, including adaptive methods. This can greatly enhance the usability of the proposed method.

In addition, the system may be integrated with SED-ML. SED-ML allows description of the simulation environment, which includes the name of the numerical method to be used in the simulation. A starting point for integration would be to provide TecML information of a numerical method for the KiSAO entry in SED-ML to allow simulation software to refer to this description. Another point would be to create support for SED-ML files in the code generation system in order to create more complex experiments that use more than one model or experiments with different simulation methods applied. This will allow users the flexibility of choosing not only the ODE solving methods but also the experiment protocol when generating their simulation codes.

## Conclusion

In this paper, we proposed a method to automatically generate executable simulation codes using CellML physiological models and ODE numerical solution methods. The generated code describes the time-evolution of the set of differential equations enumerated by the CellML model. The code generation system is composed of a two-stage approach that allows flexible generation of complex sets of equations.

To evaluate the effectiveness of the proposed system, several combinations of physiological cell models and ODE solving schemes were generated. The output of the numerical approximations were in accordance with the published results of the cell models. Results for the Luo-Rudy 1991 simulations also indicated that it only has first-order accuracy. The comparison of execution time for 2D excitation propagation also showed that the use of GPU can accelerate the processing time by 50 times as compared to a CPU.

The code generation system allows executable simulation codes to be easily generated from CellML and TecML files. This can be very useful in the field of biological model simulation since it provides the tools to quantitatively evaluate the mathematical equations in these models.

## Declarations

### Acknowledgements

The authors would like to thank the Japan Ministry of Education, Culture, Sports, Science and Technology (MEXT) for the Grant-in-Aid for Scientific Research (Kakenhi) funding.

## Authors’ Affiliations

## References

- Cuellar AA, Lloyd CM, Nielsen PF, Bullivant DP, Nickerson DP, Hunter PJ:
**An overview of CellML 1.1, a biological model description language.***Simulation*2003,**79:**740–747.View Article - Hucka M, Kitano H:
**The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.***Bioinformatics*2003,**19:**524–531.PubMedView Article - Yoshiyuki A, Yasuyuki S, Yoshiyuki K:
**Specifications of insilicoML 1.0: a multilevel biophysical model description language.***J Biol Sci*2008,**58:**447–458. - Waltemath D, Adams R, Bergmann FT, Hucka M, Kolpakov F, Miller A, Moraru II, Nickerson DP, Sahle S, Snoep JL, Le Novere N:
**Reproducible computational biology experiments with SED-ML - the simulation experiment description markup language.***BMC Syst Biol*2011,**5:**198.PubMedView Article - Fitzhugh R:
**Impulses and physiological states in theoretical models of nerve membrane.***Biophys J*1961,**1**(6)**:**445–466.PubMedView Article - Garny A:
**Cellular Open Source (COR): a public cellml based environment for modeling biological function.***Int J Bifurcation and Chaos*2003,**13**(12)**:**3579–3590.View Article -
**The CellML Compiler and Code Generator**[http://sourceforge.net/projects/cellmlcompiler] - Luo CH, Rudy Y:
**A model of the ventricular cardiac action potential, depolarization, repolarization, and their interaction.***Circ Res*1991,**68:**1501–1526.PubMedView Article - Matsuoka S, Sarai N, Kuratomi S, Ono K, Noma A:
**Role of individual ionic current systems in ventricular cells hypothesized by a model study.***Jpn J Physiol*2003,**53:**105–123.PubMedView Article - Luo CH, Rudy Y:
**A dynamic model of the cardiac ventricular action potential. I. Simulations of ionic currents and concentration changes.***Circ Res*1994,**74:**1071–1097.PubMedView Article - Ten Tusscher KH, Panfilov AV:
**Cell model for efficient simulation of wave propagation in human ventricular tissue under normal and pathological conditions.***Phys Med Biol*2006,**51:**6141–6156.PubMedView Article - Herd JR, Postak JN, Russell WE, Steward KR:
*Software cost estimation study: Study results, Final Technical Report, RADC-TR77–220, vol. I. Rockville*. Inc., Doty Associates; 1977. - Shimayoshi T, Amano A, Matsuda T:
**A generic representation format of physiological experimental protocols for computer simulation using ontology.***Engineering in Medicine and Biology Society, 2007. EMBS 200. 29th Annual International Conference of the IEEE*2007, 382–385.View Article -
**Kinetic Simulation Algorithm Ontology (KiSAO)**[http://www.biomodels.net/kisao]

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.