Skip to main content
Fig. 1 | Source Code for Biology and Medicine

Fig. 1

From: PureCN: copy number calling and SNV classification using targeted short read sequencing

Fig. 1

Flowchart of the PureCN data pre-processing pipeline and algorithm. a PureCN usually starts from BAM files and calculates average and total coverages for all targeted genomic regions. Coverage data are then corrected for GC-bias. Concurrently, SNVs are called using third-party tools such as MuTect [22]. b The main algorithm takes the generated data from tumor as input. If multiple process-matched normal samples are available, the algorithm can optionally use this pool of normal samples to (i) adjust SNV allelic fractions for non-reference mapping bias and (ii) select a best process-matched normal to obtain a clean copy number profile. A pool of normal samples is recommended when matched normal samples are not available. After copy-number normalization and segmentation, local optima for tumor purity and ploidy are obtained via 2D grid search. Integer copy numbers are then assigned to all segments for all local optima via Simulated Annealing. Final likelihood scores are obtained by fitting SNVs to all local optima. If necessary, samples are flagged for manual curation. Steps in bold font indicate alternative start points, allowing incorporation of PureCN into third-party copy number pipelines

Back to article page