Font Size: a A A

An Algorithm For DetectingSingle Nucleotide Variantfrom Next Generation Sequencing Data

Posted on:2019-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:P Z FanFull Text:PDF
GTID:2428330572952114Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of new generation sequencing technology,the development and evolution of cancer using sequencing data are advancing rapidly.Factors causing cancer often is much and complicated,but it was gene mutations in the cancer occurrence and development of root cause,the disease germ cell mutation and cancer genetic,and somatic mutation is considered one of the main factor in the development of cancer,the day after tomorrow.This thesis studies the detection of unit point mutations in somatic cells.Of somatic unit point mutation detection method is roughly divided into two categories: one is through the detection of SNPs,and to compare the test results and the existing database,find meaningful mutations,the benefits of this approach is to detect the current verification and clinically meaningful site,but if the tumor site information or other diseases,the relation between this kind of method is not suitable for,so the method is suitable for clinical application;There is also a method for the detection of somatic mutations in two samples,this method need to consider the relationship between the two samples and the sample data characteristic,using Bayesian approach.However,there is a great difference in the detection results of the existing methods for the low purity of the tumor.By analyzing the sample characteristics and practical application demand,this paper proposes a improved SNP algorithm based on single tumor samples,considering than quality of the samples,and recalculate the threshold to candidate SNV.The performance of the algorithm is compared with other methods,and the validity of the algorithm is verified.At the same time the method was applied to the real data of breast cancer,and comments on the test results,the proposed algorithm for EGA1 data samples are analyzed,and 64 are included in the test results,meaningful mutations with COSMIC database and Clinvar database,notice the ANF280 D and AKAP9 two genes associated with ductal breast cancer;Analysis of EGA2 data showed that the genes associated with ductal breast cancer were TCE3 and PRKC.For normal-tumor paired samples,this paper constructs the 52 characteristic vector,using the simulation data of four kinds of classification algorithm?Bayesian,SVM,logistic regression,random forests?training,and through the simulation data analysis of the performance of four kinds of classification model,found that the effect of the random forest is best.At the same time the four kinds of classification model with other SNV detection method based on tumor-normal paired samples were compared,and the structure characteristics of 52 importance analysis,found in the tumor sample base of reverse and normal sample quality has a great influence on the SNV test.Finally,the trained four model applied to three groups of real sequence of breast cancer samples,SNV test,and the results of four kinds of model to detect the common annotate,on EGA1 data shows two genes PRKCQ and intrauterine growth retardation and CRYZ EGA2 data show and two catheter breast cancer related gene TCE3 and PRKCZ.
Keywords/Search Tags:NGS, SNV, Single Sample, Case-control Sample, Annotation
PDF Full Text Request
Related items