Skip to main content

Table 6 Run-time comparison of bammarkduplicates2 and alternatives on compute farm nodes (part b)

From: biobambam: tools for read pair collation based algorithms on BAM files

Run-time comparison for BAM duplicate marking on server blades

Data set

Program

Memory/GB

Run-time/minutes

ERR328876

biobambam

0.45

212.38±2.22

 

Picard

15.74

443.66±1.77

 

Picard3,16

15.74

253.92±1.67

 

Picard3,64

52.41

252.21±1.11

 

bamUtil

1.20

207.29±2.33

ERR054938

biobambam

0.45

210.16±2.62

 

Picard

15.87

575.35±3.07

 

Picard3,16

15.87

287.02±2.19

 

Picard3,64

54.87

285.06±1.27

 

bamUtil

7.12

401.90±1.92

ERR328190

biobambam

0.45

289.00±2.34

 

Picard

 

≥1440

 

bamUtil

16.73

914.81±6.16

SRP017681

biobambam20

0.45

388.82±2.57

 

biobambam24

6.31

332.38±3.32

 

Picard

15.90

363.18±2.63

 

Picard3,16

15.90

288.95±1.32

 

Picard3,64

63.39

290.72±1.62

 

bamUtil

 

≥1400

ERP001231

biobambam

0.45

729.98±3.36

 

biobambam8

0.45

674.99±11.93

 

Picard

 

≥1440

 

bamUtil

≥22.35

 
 

bamUtil8

23.85

916.62±4.71

  1. Run-time comparison of biobambam’s bammarkduplicates2, Picard’s MarkDuplicates and bamUtil’s dedup for the data sets ERR328876, ERR054938, ERR328190, SRP017681 and ERP001231 described in Table2 on compute farm nodes. For the data set SRP017681 bammarkduplicates2 was run with a default hash table size of 220 and an increased size of 224 for comparison. For ERP001231 bamUtil was only capable of processing the file using 23.85 GB ≈ 25.6 · 109 ≥ 24 · 109 bytes of memory. In consequence we needed to reduce the number of concurrently running processes. We have reduced it to 8 instead of 10. For comparison the table also contains the run-time of bammarkduplicates2 for 8 instances running in parallel. Picard failed to process the data sets ERR328190 and ERP001231 within the 24 hour limit due to inefficient I/O. We have verified that these issues persist for larger amounts of memory. Picard used close to the offered 16 GB of memory for the data sets ERR328876, ERR054938 and SRP017681. We have verified that no significant improvement in speed was available through the usage of more memory. For this purpose we have run Picard on these data sets with 16 and 64 GB of memory with a reduced concurrency of 3 parallel running identical processes.