Skip to main content

Table 3 The descriptions of raw short-read sequences used in the evaluation experiments

From: Software for pre-processing Illumina next-generation sequencing short read sequences

DataSet Caenorhabditis elegans Saccharomyces cerevisiae S288c Escherichia coli O157 H7
Taxonomy ID 6239 559292 83334
Reference Genome size (bp) 100.3 M 12.2 M 5.5 M
#Chromosomes 7* 17* 1
SRA run SRR065390 SRR449310 SRR957847
Platform Illumina Genome Analyzer II Illumina HiSeq 2000 Illumina MiSeq
Strategy WGS WGS WGS
Source Genomic Genomic Genomic
Layout Paired Paired Paired
Read length 100 76 150
Nominal length 356 230 350
Total sequences (paired) 33,808,546 1,898,259 2,241,778
Total bases (paired) 6,761,709,200 288,535,368 672,533,400
Mean Phred quality score 29.49 34.17 33.12
Low Phred quality score (<=10) 1,902,576 (2.81%) 167,669 (4.42%) 76,598 (1.71%)
Coverage 67.4x 23.7x 122.3x
GC content (%) 35 39 50
  1. *The mitochondrial chromosome is included.