Font Size: a A A

Research On Tumor Sequencing Data Based Subclonal Reconstruction Method

Posted on:2020-04-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y S ChuFull Text:PDF
GTID:1364330590472858Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Tumors contain multiple,genetically distinct populations of cells that have been generated through successive acquisition of mutations from a single progenitor population.Genetic intra-tumor heterogeneity can lead to tumor adaption and therapeutic failure through Darwinian selection.A mutation's subclonal population is defined as all the cells in the tumor that contains this mutation.Subclonal reconstruction,which is reconstructing subclonal population evolutionary tree,can help identify characteristic driver mutations associated with cancer development and progression and design more effective treatments.The existing sequencing data based subclonal reconstruction algorithms infer tumor's evolution process based on the variant's subclonal population frequencies.With the development of sequencing technology and the decreasing of sequencing cost,it is possible to analyze the tumor through multiple sequencing data generated through various sequencing technologies in the process of tumor development.However,currently there is no automatic method to conduct subclonal reconstruction based on tumor's multiple sequencing data generated through various sequencing technologies.Thus,this thesis is motivated to provide a thorough analysis of subclonal reconstruction based on the somatic copy number alternation(SCNA,which is the most common variant in tumor).The major research content of this thesis include the following four parts.(1)A tumor sequencing data bias correction method based on Bayesian probability model and hierarchical clustering method is proposed.Existing tumor and its paired normal read count ratio based SCNA's subclonal population frequency analysis methods assume that in any genome region that contains SCNA,the mapped read count ratio is proportional to the copy number and have the same bias feature,thus the bias is offset by division.However,we find that the read count ratio still presents bias and the bias shows a log-linear pattern.We propose a Bayesian model to leverage the bias pattern to correct the read count ratio bias.This bias correction model uses the MCMC method to select the best corrected data from the distribution of the corrected data.The likelihood of best corrected data is set to the sum of the peaks' value of the kernel density curve of the logarithm of the read count ratio of the corrected data.The number of density peaks is set to the product of the pre-specified number of subclonal populations and the maximum copy number.Compared with existing Loess or Linear regression based bias correction methods,our method correct the bias more correctly and efficiently.(2)The analysis of the solution space of SCNA's subclonal population frequency and a SCNA segments merging algorithm are proposed.Existing NGS data based SCNA detection tools find the SCNA position according to the difference between read counts of tumor and its paired normal.Thus the tools with the higher sensitivity would be more affected by the sequencing error.Since the subsequent subclonal population frequency analysis tool assumes that the genome region between two adjacent breakpoints only contains one type of variation or does not contain any mutations,therefore,in order to reduce the false discovery rate of the breakpoints,a highly sensitive mutation detecting tool is used to detect the variation breakpoints.However,excessive false positive breakpoints make the process of subclonal population frequency inferring time-consuming and inaccurate.The fragment merging method proposed in this thesis first uses hierarchical clustering algorithm to cluster the fragments according to the logarithm of the read number ratio,and then uses the mean shift algorithm to decompose each category of fragment according to B allele frequencies at the heterogeneous allele loci in the fragment.Finally,the fragment merging method merge the adjacent fragments of the same category.The experimental result shows that our algorithm could efficiently filter out the false positive breakpoints,so as to reduce the time consumption of SCNA subclonal population frequency inferring algorithm.(3)A method for calculating the subclonal population frequency based on the Bayesian network is proposed.Existing subclonal population frequency inferring methods' accuracy is poor and in order to converge the solution process,existing methods artificially limit the solution space or artificially make additional unproved assumptions.Based on the analysis of solution space of subclonal population frequency in(2)and the analysis of bias pattern of SCNA in(1),we proposed a Bayesian network based subclonal population frequency model and use MCMC method to solove the subclonal population frequency.In this model,according to Lander-Waterman's read coverage model,the number of tumor read aligned in the SCNA fragment was set Poisson distributed and the number of tumor read's B-allele aligned to the heterozygous allele sites in the SCNA fragment is set to obey the binomial distribution,and the prior distribution of the subclonal population frequency is set to obey the Dirichlet process.The experimental result shows that the subclonal population frequency inferring method proposed in this thesis could more accurately obtain the subclonal population frequency and could obtain more than 3subclonal populations' frequencies.(4)The multi-step tree learning algorithm and the multi-stage tree learning based subclonal reconstruction are proposed.In this thesis,we propose a new topological rule,time-order topological rule,based on the existing topological rules for the algorithms that use variants' subclonal population frequencies to conduct subclonal reconstruction.Based on this rule,we propose a new machine learning method which is named as multi-step learning.While the multi-stage tree learning method performing evolution tree structure sampling for mutation's subclonal population besed on MCMC,the tree nodes of the subclonal population are sampled in the order of mutations' occurrence.While sampling the node for the subclonal population of the current mutation,in order to make the tree structure in line with the evolution process,we limit current subclonal population not to stay in the ancestor nodes of the nodes where the subclonal populations generated earlier in the tumor.We also provide the extended algorithms of multi-step learning which could utilize the target sequencing data,single cell sequencing data and NGS data to conduct subclonal reconstruction.We then construct a SCNA's subclonal reconstruction pipeline by stringing the multi-step learning,SCNA probability model,SCNA time order detection model and SCNA detection tool together.The experimental result shows that the SCNA pipeline proposed in this thesis could more precisely conduct subclonal reconstruction.
Keywords/Search Tags:Multiple stages tree learning, Subclonal population reconstruction, Somatic copy number alternation, Next generation sequencing, Single cell sequencing, Target sequencing
PDF Full Text Request
Related items