Font Size: a A A

The Research On Detection Method For Tumor Genome Structural Variation And Its Application

Posted on:2018-09-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:1314330542469435Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The occurrence and development of cancer is a continuous process of evolution,which seriously endangers human health.Structural variation(SV)is the main genetic marker of tumor genome,which affects tumorigenesis by regulating the expression of related functional genes.Therefore,understanding and accurate detecting the tumor SV is a very important step in the diagnosis and treatment of cancer,and is also the focus of current research.Identifying driving genes and inferring tumor development through gene SV data,thus providing important insights into the diagnosis and treatment of tumors.This dissertation is focus on the SV of tumor genome.Concretely,according to the characteristics of SV in tumor genome,this paper simulates a full spectrum of variants,designs and develops the corresponding tumor SV detection algorithm,constructs tumor phylogenetic tree and identifys tumor targeting genes based on tumor SV profiles.The main work of this dissertation is summarized as follows:1)This paper firstly introduces the research background,purpose and explores the importance of tumor heterogeneity in disease diagnosis,prognosis and treatment;second reviews the detection platform of SV,introduces the strategies and algorithms of the SV detection based on high-throughput sequencing data;finally introduces related knowledge and main algorithms of tumor phylogeny.2)The rapid development of genome detection technology makes it possible to analysis cancer with higher resolution and lower cost,however the research on SV is still in the exploratory stage.At present,there is still no complete annotation for human genome SV.It is difficult to make a comprehensive evaluation of SV study solely based on real data without “gold standard”.Therefore,this paper proposes an algorithm called MVSC to a full spectrum of genome mutations.According to the tumor genome variation accumulation characteristics,MVSC simulates germline genome variation,normal somatic genome variation and tumor genome variation one by one based on the input reference genome.Compared with the germline genome,tumor genome variants are not constraint and contain complex SV.Therefore,MVSC simulates complex SV for tumor genome.The experimental analysis shows that the variation simulated by MVSC could effectively capture the genomic variation characteristics,which is indispensable for performance evaluate of variation detection algorithms.MVSC is much better than other similar algorithm in CPU memory requirement and time performance.3)The expression of functional genes is affected by tumor SV,which is closely related to the formation of tumor.Studying the SV of tumor genome helps to elucidate the tumorigenesis and provide a theoretical basis for tumor diagnosis and treatment.There are a large number of repetitive sequences in the human genome,and SV detection in repeat regions has always been a challenge.Our study proposes an algorithm seeksv to detect tumor SV,which uses the “rescue” strategy to detect SV in repeat regions.The sequencing read with multi-alignment positions in the reference genome can be used as signal to support SV detection,improve the true positive rate of SV detection in repeat region.In seeksv,different types of sequencing data,such as single-end sequencing data or paired-end sequencing data can be used to detect SV.Seeksv detects deletions,insertions,inversions,translocations,and viral integration at a single base resolution level.The soft-clipped read with the same clipped direction and coordinate need to be assembled into contig,seeksv does not depend on any assembly software and uses Burrows-Wheeler Aligner(BWA)to align the assembled contig back to reference,which greatly reduces configuration requirement of the computer for algorithm.Results on simulated and real data from the 1000 Genomes Project and esophageal squamous cell carcinoma(ESCC)samples show that seeksv has higher efficiency and precision than other similar algorithms in detecting SV.The detection of hepatitis B virus(HBV)integration also shows that more than 90% viral integration sequences detected by seeksv are true.4)The treatment of tumor should not only focus on the current state of the tumor,but also predict the state after treatment because of its continuous evolution.The evolutionary nature of tumor development allows researchers to infer tumor evolution through tumor SV,tumor evolution is different species evolution,and there is no standard model and algorithm for tumor evolution.The inference of tumor phylogenetic tree is treated as the minimum Steiner tree problem in directed graph.Finding the Steiner node is a NP-hard problem with no polynomial time solution.Our study proposes an improved binary differential evolution algorithms,BDEP,to infer tumor phylogenetic tree.By discretizing the continuous value of the original differential evolution algorithm and adopting the neighborhood learning strategy to improve population diversity in the crossover operation,BDEP provides a better approximate solution in acceptable running time.Experimental results show that the tree topology inferred by BDEP can find important driving genes for tumor development,and its quantitative numerical characteristics can be used as key predictors of tumor evolution.And the classification experiment shows that tree-based features captured by BDEP have great performance in classifying tumor types,which outperforms other similar algorithms and data-based features.
Keywords/Search Tags:Tumor genome, Variation simulation, Structural variation detection, tumor phylogenetic tree, Differential evolution, Steiner tree
PDF Full Text Request
Related items