Font Size: a A A

Integrated Detection Of Cope Number Variation Based On Next Generation Sequencing Data

Posted on:2019-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:W W LiuFull Text:PDF
GTID:2428330551457975Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Although the sequence length measured by the Next-generation Sequencing(NGS)technology is less than that of the third-generation sequencing data,it has the advantages of higher sequencing accuracy,lower sequencing cost and so on,so most genomic researches currently are based on NGS data.In addition,the 1000 Genomes Project provides the major researchers with abundant human NGS data as well as benchmark variant data.At present,many methods have been used to detect genomic structure variation according to different strategies,there are also many statistical models detected the Copy Number Variation(CNV)based on read depth strategy.However,because of the characteristics of multiple subtypes and long variant length of CNV,the conventional method has a great limitation that the precision and sensitivity are lower,and the breakpoint accuracy is not high enough in the detection of CNV.Therefore,based on NGS data,considering three subtypes of CNV,such as insertion,deletion and duplication variation with length greater than or equal to 50bp,this paper proposes a new method of CNV detection by integrating sequence assembly and read depth strategy.Sequence assembly is to detect CNV with long length,and combining the read depth information can not only detect deletion and duplication variation effectively,but also can identify the insertion variant breakpoint effectively.The main work is as follows:1.The real sequencing data and corresponding benchmark variant data were obtained,then analyzed and processed them in advance.Multiple sets of simulated CNV data with different type and coverage based on the characteristics of real sequencing benchmark data were generated.A variety of current mainstream tools were used to detect the multiple subtypes of CNV on these data sets.The detection results were analyzed and compared to evaluate each tool,and the candidate variant breakpoints were obtained.2.In order to detect CNV with longer length,this paper proposed a local sequence assembly method by using the fault tolerant OLC assembly algorithm and path compatibility strategy.Firstly,the filtered high quality sequence in the local range of the candidate variant breakpoints were segmented.Secondly,the sequences were assemblied based on OLC algorithm with a fault-tolerant rate.Then,the path compatibility strategy was used to choose the proper path on the constructed directed graph.Finally,the sequence with the conditions was retained as the assembly result.3.In order to effectively detect multiple subtypes of CNV,this paper integrated the read depth information and the assembly results to detect CNV.The high-quality soft-clipped sites from the alignment file of assembly results were obtained,and then the read depth information of the original data was combined to detect CNV.The experimental results showed that the integrated method proposed in this paper had high precision and sensitivity to the detection of multiple subtypes of CNV in the real and simulated sequencing data with low and high coverage.And the accuracy of the breakpoint was guaranteed in multiple groups of experiments.
Keywords/Search Tags:next-generation sequencing, copy number variation, assembly, read depth, integrated detection
PDF Full Text Request
Related items