Font Size: a A A

Detection Of Copy Number Variation Based On Statistical Examination

Posted on:2022-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:2480306605968569Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Thanks to the enhancement and extensive use of second-generation sequencing technology,genetic testing has developed significantly,which has greatly accelerate the research on human diseases.Copy number variation(CNV)is a key part of genomic structural variation to manifest in deletions and duplications at the submicroscopic level.It has been confirmed that this type of mutation is closely related to human cancers and genetic diseases.Accurate detection of copy number mutations is of great significance for cancer mechanism and targeted drug discovery.Most of the existing copy number variation detection methods have limited accuracy in the case of low coverage of sequencing data.For this reason,this thesis proposes a copy number variation detection algorithm based on statistical test theory,which mainly includes the following three aspects of work research:In the first part,this thesis proposes a new fixed-step partial overlap sliding window in the sample preprocessing step,which is an improvement method based on the traditional non-overlap sliding window method.Compared with the input sample with a single base RD value,the amount of data of the partially overlapping sliding window is greatly reduced,which generate less of computer resources consumption.Furthermore,the overlap between the sliding windows,in a result,increase the amount of window data for the base sequence per unit length.Generally,the degree of abnormality of the sliding window should be determined by all windows overlapped,which will reduce the influence of abnormal signals on a single window.In comparison to existing non-overlapping sliding window,the partial overlap fixed sliding method effectively increases the data sample size per unit distance,which greatly reduces the influence of noise signal errors on the detection results assuming in an accurate basis.In the second part,a method based on statistical testing of CNV detection proposed in this part.This detection method is applicable in hypothesis testing theory to CNV detection,using the mean RD of each sliding window as a hypothesis sample,and analyzing rationality of null hypothesis under a certain probability through statistics distribution,so as to test whether it is a CNV interval.In view of the low detection accuracy of the statistical inspection algorithm in the detection process,this thesis adopts a new method of secondary statistical iterative inspection.First,remove the copy number variation region found in the first statistical test,screen out the normal copy number segments that are not mutated,and recalculate the copy number mean and variance of the normal segment to replace the copy number mean and variance of the entire segment.Perform statistical test modeling.This method greatly reduces the influence of the copy number variation region on the error of the hypothesis test model,thereby effectively improving the sensitivity of copy number variation detection.For the third part,in regard to the positional correlation of adjacent sliding windows during detection testing,a new method provided and designed to narrow the variation area by comparing the RD value of the sliding window at the adjacent position of the variation area,which eliminate effect on experimental results deriving from error of small normal area of the CNV edge area,thereby improving the accuracy and sensitivity of CNV detection.
Keywords/Search Tags:second-generation sequencing technology, copy number variation(CNV), overlapping sliding, hypothesis testing
PDF Full Text Request
Related items