Font Size: a A A

Comprehensive Detection Method Of Copy Number Variation And Its Boundary For Next-generation Sequencing Data

Posted on:2023-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiaoFull Text:PDF
GTID:2530306908950709Subject:Engineering
Abstract/Summary:PDF Full Text Request
Copy number variation(CNV)is an important structural variation in the genome,which is usually the most common variation in the genome.Studies have shown that whether the CNV region of the genome contains genes with biological significance is closely related to the generation and development of cancer cells.Therefor,a reasonable analysis of CNV can provide important information and scientific basis for the study of cancer pathogenesis and targeted accurate diagnosis and treatment.Accurate detection of CNV is the basis of reasonable analysis.However,due to the long region and complex forms of CNV,accurate detection of CNV is a great challenge.For the second-generation sequencing data,the detection strategies of CNV are roughly divided into four types,namely pair end mapping approach,split read approach,read depth approach and de novo assembly approach.Among them,the read depth approach shows its unique advantage,and a large number of such methods have been developed.However,these methods are often not universal,the CNV boundary detection is not accurate enough,and the precision and recall can not reach the ideal level at the same time.The split read approach is not good at detecting CNV,but it can accurately detect the boundary of CNV to a single base.In this thesis,a comprehensive detection method of CNV,CNVbd,is proposed,which includes the core ideas of read depth approach and split read approach,and combines the advantages of the two approaches.It can identify CNV and find the accurate CNV boundary at the same time.The proposed method designs and calculates four characteristics related to CNV based on the data profile of read depth,namely read count,density,non-adjacent minimum distance,and the average depth in isolated forest,and establishes a backpropagation neural network model to identify CNVs.Afterward,this method designs a boundary searching algorithm based on the idea of split read approach,which uses dichotomy to process the read depth data of the region where the CNV boundary is located,so as to accurately locate CNVs.The innovation of CNVbd method lies in the following two points.(1)Compared with the usual statistical analysis methods,which use mathematical formula to calculate the threshold to identify CNVs,CNVbd extracts more features,and uses neural network to calculate more flexible threshold in order to identify CNVs more accurately;(2)Because that dealing with the boundaries of CNVs has always been the difficulty and weakness of the read depth approach,CNVbd method uses the strategy of split read approach for reference,dividing a read into smaller segments to further analyze the read depth data,so as to find the boundary of a CNV more accurately.In order to verify the detection effect of CNVbd method,experiments are carried out on simulation data sets and real data samples,and the experimental results are compared with four other methods.In addition,the edge searching algorithm in this method is combined with other algorithms to further illustrate the advantages of the edge searching method.In the results of simulation data experiment,CNVbd method achieves the highest F1 score in most data sets,and also achieves good results in CNV boundary deviation.In the experimental results of real data,CNVbd method achieves better F1 score,accuracy and ODS value.The experimental results in two data sets fully prove the effectiveness of the CNVbd method.
Keywords/Search Tags:Next-generation sequencing technology, Copy number variation, BP neural network
PDF Full Text Request
Related items