A CellML simulation compiler and code generator using ODE solving schemes
© Punzalan et al.; licensee BioMed Central Ltd. 2012
Received: 28 March 2012
Accepted: 31 July 2012
Published: 19 October 2012
Skip to main content
© Punzalan et al.; licensee BioMed Central Ltd. 2012
Received: 28 March 2012
Accepted: 31 July 2012
Published: 19 October 2012
Models written in description languages such as CellML are becoming a popular solution to the handling of complex cellular physiological models in biological function simulations. However, in order to fully simulate a model, boundary conditions and ordinary differential equation (ODE) solving schemes have to be combined with it. Though boundary conditions can be described in CellML, it is difficult to explicitly specify ODE solving schemes using existing tools. In this study, we define an ODE solving scheme description language-based on XML and propose a code generation system for biological function simulations. In the proposed system, biological simulation programs using various ODE solving schemes can be easily generated. We designed a two-stage approach where the system generates the equation set associating the physiological model variable values at a certain time t with values at t + Δt in the first stage. The second stage generates the simulation code for the model. This approach enables the flexible construction of code generation modules that can support complex sets of formulas. We evaluate the relationship between models and their calculation accuracies by simulating complex biological models using various ODE solving schemes. Using the FHN model simulation, results showed good qualitative and quantitative correspondence with the theoretical predictions. Results for the Luo-Rudy 1991 model showed that only first order precision was achieved. In addition, running the generated code in parallel on a GPU made it possible to speed up the calculation time by a factor of 50. The CellML Compiler source code is available for download at http://sourceforge.net/projects/cellmlcompiler.
In recent years, the continued development in computer processing power paved the way for the increased use of biological function simulation. Computers have proven to be invaluable in analysing complex and nonintuitive biological models and biologists are turning to them to complement their experiments. Simulations enable the testing of experimentally unfeasible scenarios and can potentially reduce experimental costs. However, the number and complexity of physiological models has also grown with the increase in computing performance. This creates challenges in reproducing simulated behaviours of the published models and reuse of models by other researchers, hindering the dissemination of science and knowledge integration.
One way to address model complexity is to use markup language-based model descriptions. Some popular examples include CellML , SBML (Systems Biology Markup Language)  and insilicoML . CellML is an open standard markup language capable of describing mathematical models of cellular functions. SBML is an open interchange machine-readable format for representing models of functions such as metabolism and cell signalling. Meanwhile, insilicoML describes mathematical models for biophysical objects and incorporates morphological information such as shape, angle and position. SED-ML (Simulation Experiment Description Language)  is another type of description language which can encode the information of simulation experiments. These markup languages allow researchers to take advantage of the vast amount of biological function models using a common set of easily readable and versatile description rules.
Biological and physiological function models are generally described by differential equations. A typical simulation of biological function models consists of three parts: a model equation, a boundary condition, and an ordinary differential equation (ODE) solver. Model equations and boundary conditions can be described using CellML, while ODE numerical solutions like Euler and Runge-Kutta methods are typically built into the simulation software. However, it is necessary to be flexible in using ODE schemes in order to strike a balance between computational stability and speed. In addition, those using special hardware environments such as massively parallel computer systems require dedicated proprietary software to support their numerical solution needs. Thus, description languages like CellML and dedicated simulation software are not suitable or practical for flexibly incorporating different ODE solving schemes.
To address the need for more flexibility in creating simulation software, we created Time Evolution Calculation Markup Language (TecML), a machine-readable format for encoding ODE numerical solutions. TecML is a description language based on the extensible markup language (XML). This description language is designed to specify and store the numerical methods that can be used for solving the ODEs in biological models. It also allows the assignment of boundary conditions into the simulation experiments. The following sections describe TecML and how it is integrated into the proposed code generation system, which automatically generates codes for biological simulations.
The target of this study is limited to the use of different ODE numerical solutions and their application to models described in CellML. We propose an algorithm that allows users to change the ODE solution and boundary conditions of the model according to the computational needs of their simulation. To verify the effectiveness of the proposed system, we generate executable codes for several CellML models using a number of ODE numerical solutions. The system can generate code in several programming languages and code that runs in both sequential and parallel computing environments. Simulations on GPU (Graphics Processing Units) were undertaken to show the effect of using parallel computing on processing time.
where r, x and y are variables, t indicates time and a, b, c, d are constants. Removable algebraic expressions such as equation (1) are often used in biological models to improve the model’s readability.
where ξ,κ and ι are the vectors representing the variables and differential equations in the biological model.
The variable r of the equation set corresponds to ι while x and y are the variables in the vector ξ. The derivatives dx/dt and dy/dt are expressed by κ in equations (20), (21), (26) and (27).
In order to automatically incorporate the ODE numerical solution into the CellML model, we introduced an ODE solving scheme description language called Time Evolution Calculation Markup Language (TecML). TecML is an XML-based description language that can be used to configure ODE solutions and implement the algorithm described in equations (6) – (15). Another input of the system is the Relation Markup Language (RelML) file. The basic role of a RelML file is to relate the variables and variable types of the CellML file into their equivalent in the TecML file.
The simulation code generation system uses these inputs to generate the set of equations describing the time evolution of the variables in biological models. This step creates equations (16) – (31) in the FHN example. Once the model equations are applied with the ODE numerical solution method, the next stage involves the generation of the executable code that will do the actual calculations.
The code generation system requires three inputs, namely, the CellML file, TecML file and RelML file. This section gives a short introduction of each and how it is used in the algorithm and consequently, in the FHN model example.
Variable and function types in TecML
Differential variable (variable is a function of time)
Temporal variable (can be substituted with math equations)
Derivative of diffvar
Variable denoting the change of time per step
Input and output code types in the system
Information written in a TecML file
Input differential variable vector
Output differential variable vector
ξ i (1 ≤ i ≤ N ξ )
Time differential variable vector
Differential variable vector of ξ
Current time value
τ = T i (t,δ)
Calculation of variable T i
Differential equation vector
Temporal function vector
κi + 1 = Φ(ξ,ι,t,ζ)
Derivative variable vector
ι i = Γ(ξ,ι,t,ζ)
Temporal variable derived from ξ
Derivative vector of κ
ξ i = σ
Relation between ξ i and
Note that the Strachey brackets in means that all the TecML diffvar (i.e. for i = 1…N ξ ) in the argument is replaced with the corresponding CellML diffvar x i . This function is true for all the other variable types in TecML.
The FHN cell model  was used to evaluate the proposed system. The system was also used to generate simulation codes for a number of cell models (LuoRudy1991 , LuoRudy1994  and KyotoModel2003 ) and with other more accurate ODE numerical methods (e.g. 4th-order Runge-Kutta method). All the generated code used in the simulation experiments can be downloaded from the files section of the program generator site .
The traditional approach to creating simulation experiment codes also offers little flexibility once the software is created. Changing the numerical ODE solution method means making major revisions in the simulation code. The proposed method allows the flexibility of using different simulation models, boundary conditions, and numerical methods for a simulation experiment. Since the simulation codes are generated automatically, users can choose or change their desired ODE solver without making changes in the simulation codes themselves. Also, this can give clear information on what is calculated to generate the simulation results.
The number of execution steps for the generated codes for different cell models and ODE numerical methods
An issue in approximating ODE solutions is the accuracy of the numerical method used. A good approximation to the underlying differential equation needs to be achieved in order to arrive at accurate simulation results. We tested a number of commonly used ODE numerical methods to determine how the use of different solutions affect the accuracy of the calculations. The three methods used were Euler, Modified Euler and 4th-order Runge-Kutta. Each of these methods were used to generate an FHN model simulation code that runs in a single CPU. The generated code can run in different compilers and does not require third party software.
The first-order Euler method gave the largest error (log error ≈10−3) while the Modified Euler resulted to a smaller error. The fourth-order Runge-Kutta method resulted with the best accuracy (log error ≈10−10). However, the Runge-Kutta error is almost constant from the time step 10−3 ms, . This can be attributed to the rounding error of digits over the used 32-digit precision.
Simulation codes using different ODE numerical methods were also generated for models more complicated than the FHN or Hodgkin-Huxley model. For this, we used the model introduced by C. Luo and Y. Rudy in 1991 . It is a simple cell model of cardiac action potential that uses Hodgkin-Huxley type equations to calculate ionic currents. The RMSE computations undertaken for the FHN model were also applied for the Luo-Rudy 1991 simulations. Codes were generated for the three ODE solutions with time steps ranging from 10−1 to 10−4 ms. Meanwhile, the simulation using the Euler method with a time step of 10−5 ms was used as the reference for RMSE calculations (Figure 9).
One possible application of 1- or 2-dimensional excitation propagation simulation is cardiac arrhythmia research. Computational models have provided new insights into the underlying mechanisms of re-entry like the role of ionic currents and ion channel mutations . In addition, designs of defibrillation treatments can be optimized using simulations of excitation propagation to achieve good clinical results for patients.
Simulation codes for the Luo-Rudy 1991 model were generated to simulate cell excitation propagation on a 2D homogeneous sheet. The two-dimensional cell action potential propagation simulations were run for both CPU and GPU to compare the speed of calculations. The CPU simulation was performed with an intel Core i7 880 processor with 8 GB of memory running Windows 7. The GPU has a single Nvidia Tesla C2050 processor with 448 CUDA cores, and 3 GB memory in a CentOS 5.5 system. The programs are written in C for the CPU simulations and Cuda C for the GPU.
Traditionally, biological function simulation programs have to be manually created from scratch. This is manageable with small models or simulation experiments but it becomes less practical once model complexity increases. The more complex the model becomes the larger the program becomes and the harder it is to create and maintain such programs. The Doty model  showed that the cost of software development is proportional to the square exponential of the number of computational steps. As seen from Table 4, the number of computational steps increases with the complexity of the model. By using this system, the creation of a simulation program from biological models becomes simpler. The system takes input from experts in biological functions through CellML models and mathematical experts through ODE numerical solutions and automatically generates the simulation code. The automatic code generation allows it to deal with large and complex models without the proportional increase in software development cost. This also keeps the software maintenance cost low since the programs are created with the same structure, regardless of the biological model under study.
The system’s support for multiple programming languages makes it easier for users to test the model in different programming environments and keep the cost of switching from one programming language to a low. Based on the codes generated for the purpose of this paper, science and engineering students typically create a Java simulation code from the C version within four days.
The system requires the formatted RelML file to do the first stage of the algorithm and the RelML information can also be used to automatically determine the boundary conditions. The RelML information can be extracted automatically from the CellML model file but boundary conditions are not always available. We need a mathematical representation of boundary conditions which are compatible with declarative description. This can be addressed by the inclusion of PEPML experimental protocol  in the system. PEPML is an XML-based language that can describe the initial conditions and procedures of an experimental protocol. It describes boundary conditions in a mathematical form and is purely represented in a declarative manner, making it easy to combine with the current implementation of our system.
Although the system is fully implemented, further improvements are still to be added. The generated excitation propagation code does not consider the optimization of the model variables for parallel processing. Therefore, code parallelization is low and may have contributed to slower execution in the GPU environment. Also, the system can only handle equations with a single term in the left-hand side. Other equation forms will be addressed in future implementations.
Furthermore, simultaneous calculations using differential algebraic equations (DAE) are still not available. Simultaneous equations appear in some models and implicit numerical methods. The handling of implicit methods is available in the inputs but not in the current implementation of our system. A simultaneous equation library, other ODE numerical solutions library like CVODE, or a linear algebra library like LINPACK could be incorporated into our system. This can be a possibility with our plan to design a library interface description language to handle numerical solution inputs.
Future implementations also need to address the need for a declarative design of describing procedural methods. Complex ODE numerical methods like the ones described in the Kinetic Simulation Algorithm Ontology (KiSAO)  are difficult or impossible to encode in the current version of TecML. We are in the process of redesigning TecML and the system to incorporate methods that are suitable for the types of problems scientists are addressing with manually-created code, including adaptive methods. This can greatly enhance the usability of the proposed method.
In addition, the system may be integrated with SED-ML. SED-ML allows description of the simulation environment, which includes the name of the numerical method to be used in the simulation. A starting point for integration would be to provide TecML information of a numerical method for the KiSAO entry in SED-ML to allow simulation software to refer to this description. Another point would be to create support for SED-ML files in the code generation system in order to create more complex experiments that use more than one model or experiments with different simulation methods applied. This will allow users the flexibility of choosing not only the ODE solving methods but also the experiment protocol when generating their simulation codes.
In this paper, we proposed a method to automatically generate executable simulation codes using CellML physiological models and ODE numerical solution methods. The generated code describes the time-evolution of the set of differential equations enumerated by the CellML model. The code generation system is composed of a two-stage approach that allows flexible generation of complex sets of equations.
To evaluate the effectiveness of the proposed system, several combinations of physiological cell models and ODE solving schemes were generated. The output of the numerical approximations were in accordance with the published results of the cell models. Results for the Luo-Rudy 1991 simulations also indicated that it only has first-order accuracy. The comparison of execution time for 2D excitation propagation also showed that the use of GPU can accelerate the processing time by 50 times as compared to a CPU.
The code generation system allows executable simulation codes to be easily generated from CellML and TecML files. This can be very useful in the field of biological model simulation since it provides the tools to quantitatively evaluate the mathematical equations in these models.
FRP has been involved in the drafting of the manuscript, programming and gathering of experimental data. YY was responsible for programming of the code generator software and gathering data. NS was the main programmer of the code generator. MK was involved in the simulation experiments and data gathering as well as programming. TS, HK and YK acted as consultants for the algorithm and system design. AA has been the main architect of the algorithms and system design and has given the final approval of the version to be published. All authors read and approved the final manuscript.
The authors would like to thank the Japan Ministry of Education, Culture, Sports, Science and Technology (MEXT) for the Grant-in-Aid for Scientific Research (Kakenhi) funding.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.