Font Size: a A A

Research On Cancer Copy Number Variation Detection Methods For Next-Generation Sequencing Data

Posted on:2022-12-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:G J LiuFull Text:PDF
GTID:1524306608973329Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The happening of human cancer goes hand in hand with copy number variation(CNV),and precise detection of CNV in the human body is of great significance for the prophylaxis and cure of human cancer.With the launch of genome projects in multitudinous countries,it promotes the rapid development of genome sequencing technology,from the traditional dideoxy termination method to the next-generation sequencing(NGS)technology.Compared with traditional sequencing technology,NGS technology has the advantages of high velocity,high resolution and low cost,so it is widely used in clinical treatment,drug research and development,and bioinformatics.Meanwhile,NGS data has the characteristics of high noise,high deviation,and high complexity,which will seriously affect the accuracy of detection results.Therefore,how to quickly and effectively to detect CNVs using NGS data has become one of the challenges in many application fields.Confronting the existing issues,a great quantity of CNV detection tools have been exploited utilizing the NGS data,which has promoted the application and development of genome sequencing technology in related fields.Nevertheless,the present methods still exist some issues,such as the inaccuracy of establishing CNV detection model,the low accuracy of detecting samples with different tumor purity,the insensitivity to the detection of insignificant CNVs,the low accuracy of detecting CNV boundaries,and the low robustness to CNV detection at different length levels.This paper focuses on the above difficult problems of CNV detection,and develops the corresponding CNV detection methods.The research work of this article is made up of three parts as follows:(1)A cross-model-based CNV detection method is proposed,which fully considers the issues of model overfitting and asymmetry of read depth(RD)distribution,and improves the accuracy of CNV detection model.The method can effectively preprocess the RDs,which mainly includes removing abnormal sliding windows,eliminating GC content bias,balancing the distribution of RDs,and eliminating RD signal noise.Based on the preprocessed RDs,perform continuous and non-overlapping segment processing on the RD sequence,select any segment each time to construct a test set,and fit all remaining segments to a Gaussian model,thus forming multiple test sets and Gaussian models.The probability value of each RD is evaluated,and the CNVs are predicted by hypothesis testing.For checking the performance of the proposed method,simulation data experiments and real data experiments are designed,and several popular methods are selected to compare with the method.Analyzing and discussing the experimental results,the method has good robustness in detecting multiple sets of samples with different tumor purity.(2)For the existing methods,there are some deviations in fitting the detection model by using prior knowledge,the detection accuracy of low purity samples is low,the detection of insignificant CNVs is not sensitive,and the CNV boundaries cannot be accurately predicted,resulting in a large number of false-positive results.A local kernel density-based CNV boundary detection method is proposed,which does not need to assume the distribution of RDs in advance.This method effectively preprocesses the RDs to generate read depth segments(RDSs),and uses kernel density estimation(KDE)method to evaluate the density distribution of each RDS.By extracting the ratio of RDSs and difference of the ratio of RDSs,three types of neighbors of each RDS are calculated to construct an expanded neighbor set.Based on the extended nearest neighbor set of each RDS,the KDE method is used to calculate the local kernel density for each RDS.By defining the relative kernel density outlier score(RKDOS),the RKDOS is assigned to each RDS.Based on the abnormal score of each RDS,the candidate CNVs are further predicted.A strategy of expanding the candidate CNV boundary is adopted to extract the effective split reads and predict the candidate CNV boundaries.The performance of this method is tested by taking advantage of simulated and real cancer samples,while comparing it with several existing methods.In the detection of simulated cancer samples,the method detects the most insignificant CNVs and CNV boundaries.Especially in the detection of samples with low tumor purity,the performance of this method possesses evident superiority.In the application of real cancer samples,the performance of this method is consistent with that in simulated cancer samples.(3)In view of the low robustness of existing methods in detecting CNV at different length levels,a cluster-based CNV detection method for normal-cancer paired samples is proposed.Firstly,this method preprocesses the RDs to generate RDSs,extracts the copy number corresponding to RDSs and difference of the ratio of RDSs,and uses the two characteristics to construct two-dimensional RDSs.Secondly,a clustering algorithm is implemented on the preprocessed RDSs to generate different scale clusters,and small clusters and large clusters are defined.Using the above definition,a cluster-based anomaly score(CBAS)is constructed based on the following two situations.One case is to use the distance between the RDS from the large cluster and the center of the large cluster,and the other is to use the distance between the RDS from the small cluster and the center of the large cluster which is closest to the small cluster.Finally,based on CBAS of each RDS,Tukey’s fences method is adopted to forecast CNVs.Compared with other methods,the proposed method achieves the best balance between sensitivity and false discovery rate in detecting CNV at different length levels.
Keywords/Search Tags:Next-generation sequencing data, copy number variation, cross model, kernel density, clustering
PDF Full Text Request
Related items