DE analysis with DEApp is performed in 4 steps: ‘Data Input’, ‘Data Summarization’, ‘DE analysis’, and ‘Methods Comparison’. Figure 1 shows an example of the graphical web interface of DEApp with edgeR for DE analysis. Two files are required as input data for this application, the ‘Raw Count Data’ and ‘Meta-data Table’. The ‘Raw Count Data’ contains summarized count results of all samples in the experiment, and the ‘Meta-data Table’ contains summarized experimental design information for each sample. Examples of valid input files for this application are embedded at the ‘Data Input’ sections to facilitate file formatting and preparation.
DEApp can be used for the analysis of single-factor and multi-factor experiments, even though by default DEApp is used for DE analysis of RNA-Seq data, DEApp can also be used for the identification of differential binding analysis using ChIP-Seq data, and differentially expressed micro RNA analysis using miRNA-Seq data.
After the data is uploaded on the ‘Data Input’ section, the ‘Data Summarization’ panel allows users to set up the cutoff values to filter out genetic features with very low count, as genetic features must present at certain minimal level to provide enough statistical significance for the DE multiple comparison tests. Usually it is recommended to keep genetic features which are expressed in at least one sample out of each factorial group level [11] with a defined number of reads represented by counts per million (CPM) value. By default, the application removes low expression genetic features after alignment with CPM value ≤1 in less than 2 samples. A detailed explanation on how to choose the optimal cutoff values for this step is available in the ‘introduction’ page of the system. Based on the provided cutoff values, a summary of library sizes and normalization factors for each experimental sample, before and after removal of low expression genomic features is displayed on the web interface. The sample’s normalization and multidimensional scaling (MDS) plot are also presented on the web interface to illustrate samples distribution and relationship after filtering out the low expression genomic features. Once this step is completed, the user will be presented with three commonly used methods to perform DE identification.
For a single-factor experiment, the DE analysis can be conducted between any 2 factorial groups of that single-factor; for a multi-factor experiment, the DE analysis can be conducted between any 2 selected groups out of a combination of all group levels. After specifying the group levels, the user will then need to select the parameter cutoffs to determine statistical significance. This includes nominal p-value, false discovery rate (FDR) adjusted p-value, and fold change (FC). The cutoffs for these parameters can be modified interactively on the web interface for each DE analysis section. The system then will display the dispersion plot, overall DE analysis results, and statistically significant DE results together with a volcano plot interactively corresponding to the specified parameters and cutoff values. Additionally, DEApp also provides a ‘Methods Comparison’ section that enables the comparison and cross-validation of DE analysis results with the implemented analysis methods. A summarized Venn diagram and a table will be presented on the user interface to illustrate the overlapped DE genomic features out of any 2 or all 3 selected analysis methods.
DEApp represents an intuitive alternative to the use of command line commands and scripts, or a basic functionality open source alternative to commercial packages like Partek [12] and CLC Genomics workbench (CLC bio, Aaarhus, Denmark), that are able to offer extensive analytics and sophisticated visualizations for a premium.
The functionality of DEApp can be further expanded to cover complex experiment designs with nested interactions, additive blocking, etc. It will also be possible to expand the automation of further downstream analysis to cover functional annotation and enrichment analysis.