Font Size: a A A

Research On Detection Of Copy Number Variation Based On Next Generation Sequencing Technology

Posted on:2017-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2404330488479873Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Copy number variations(CNVs)is widespread in the human genome as a gene structure variation.In recent years,there has been confirmed that CNVs is associated with many of the human complex mental diseases and the amount of information that CNVs contains will be far greater than the amount of information in the single nucleotide polymorphisms(SNPs).Researchs with copy number variations began to gradually hot up since 2004,especially with the rapid development of next generation sequencing technology(NGS)which can greatly reduce the cost of copy number number variation detection,a large number of copy number variation detection method that based on high-throughput sequencing technology has been proposed,these methods,most of it have a better performance in the detection of copy number variations.However,detection of copy number variations still have many shortcomings,such as needs large amount of computation,have a high false positive rate and false negative rate etc.In this paper,the copy number variation detection research mainly including the following two points:(1)Based on the traditional GC correction method,this paper use local polynomial regression(LOESS)to enhance the effect of GC correction.In addition,a copy number variation detection method(CNVforest)based on isolated forest algorithm is proposed for copy number variation detection,which is based on next-generation sequencing technology.CNVforest improve its recognition accuracy and reduce the false positive rate by using multi sample analysis in which each sample was treated as a feature.In addition to multi sample analysis,CNVforest use isolated forest algorithm to identify copy number variation,since isolation forest algorithm has many advantages,such as it has high efficiency,can effectively deal with high dimensional data and mass data,and so on,CNVforest can improve its performance in the Identification of copy number variations.The experimental results show that the overall effect of CNVforest algorithm is relatively good.(2)To deal with the shortcomings of CNV-CH algorithm in dealing with outliers,this works have proposed a copy number variation detection method(CNV-CLU)which is based on Gauss mixture model.Like CNVforest,CNV-CLU method is also based on the NGS,and it also uses multi sample analysis strategy,in the identification of copy number variations,CNV-CLU method use Gauss mixture model clustering instead.The experimental results of the CNV-CLU algorithm show that it decreases in precison,but increase in sensitivity.In addition,the experimental results also show that the CNV-CLU algorithm is better than the CNV-CH algorithm in dealing with outliers.
Keywords/Search Tags:copy number variation, the next-generation sequencing, isolated forest algorithm, Gaussian mixture model, the WilCoxon test
PDF Full Text Request
Related items