Font Size: a A A

Research And Implementation Of Copy Number Significance Detection Algorithms Based On CNA

Posted on:2019-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:M SunFull Text:PDF
GTID:2404330596965398Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In human cancers,Somatic Copy Number Aberration(CNA)is a common genetic event and an important feature of cancer cells,which can affect the occurrence and development of tumors.Significant Copy Number Aberrations(SCAs)refer to the significantly recurrent CNAs that affect the same region of the genome in multiple tumor samples.SCAs are widely regarded as the “driver” mutations in the occurrence and development of tumors,that may help pinpoint novel oncogenes and cancer suppressor genes,and provide a basis for clinical cancer prevention and treatment.The purpose of CNA-based copy number significance detection is to identify SCAs by detecting the significance of segmented copy number data.The aim of this study is to study the CNA-based copy number significance detection algorithms,implement each algorithm in Java,and compare each algorithm.The main research of this paper is as follows:(1)This thesis proposed the whole frame of CNA-based copy number significance detection algorithms,which provided an overall solution for the study of copy number significance detection algorithms.By analyzing the basic principle of CNA-based copy number significance detection algorithms,and summarizing the common and important steps of each algorithm and the implementation method that each step can choose,the whole frame of CNA-based copy number significance detection algorithms was proposed.(2)This thesis studied five kinds of typical CNA-based copy number significance detection algorithms,GISTIC,JISTIC,GISTIC 2.0,SAIC,and RUBIC,and analyzed the principles,characteristics,advantages and disadvantages,and implementation steps of each algorithm.This thesis reimplemented the four algorithms,GISTIC,JISTIC,SAIC,and RUBIC in Java,and compared the above algorithms for actual measurement and analysis.(3)To deal with the error problem of the SAIC algorithm in the determination of the breakpoint when determining the CNA unit by the Pearson correlation coefficient,this thesis made an improvement on SAIC algorithm by referring to the above five typical algorithms and incorporating the Bagging algorithm in the random forest,proposed a new algorithm BSAIC,and implemented it in Java.In addition,its improved effects were verified on simulated data and real data.(4)This thesis designed and implemented the algorithm tool set for copy number significance detection,and analyzed and compared each algorithm using the algorithm tool.According to the characteristics of real data and the requirements of each algorithm,this thesis proposed a simulation data set generation algorithm with 16 adjustable parameters and the corresponding evaluation criteria.For real data,this thesis tested each algorithm using the sample data set disclosed in the latest RUBIC official R code and,and compared and analyzed each algorithm using the results of the official code using the RUBIC algorithm.Through comparing and analyzing,this thesis proved that the improvement effect of BSAIC algorithm over SAIC algorithm is significant.
Keywords/Search Tags:copy number aberrations, SCA, significance detection, DNA, cancer gene
PDF Full Text Request
Related items