Font Size: a A A

Pathogenic Gene Prediction Based On Split-and-conquer Method

Posted on:2019-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:X R LanFull Text:PDF
GTID:2334330569479759Subject:Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the medical technology has developed rapidly,but there are still many problems that cannot be solved,and many of these genetic diseases still cannot beidentified.Medical scientists believe that the production of disease-causing genes is related to genetic and environmental factors.Therefore,it is particularly important for researchers to carry out in-depth exploration to achieve the purpose of profiling pathology,and it is also important to accurately predict whether or not to have a certain disease.The current maturation of information technology and genome projects in the biological sciences has made it easier for getting more gene data,thus generating a large amount of data.The dimension of gene data is also quite large.To solve this problem,this paper first proposes the K-split and spilt-and-conquer methods based on the Lasso method,and processes the datasets by block-reintegration.The test results show that based on the K-split method and the Lasso method of Split-and-conquer,not only saves time when the associated variable is selected in the massive gene data,but also the selected genes pass the correlation test.The Spilt-and-conquer methods perform better when the number of dimensions is small and far smaller than the sample size.The final experimental results show that the use of the Lasso method of Split-and-Conquer for feature selection not only allows selection of related pathogenic genes but also removes redundancy and saves time to a large extent.On the one hand,when predicting the outcome of the disease,SVM,neural networks,random forests,and Xgboost have better prediction results than before variables are selected;on the other hand,the selected factors are counted.Tests have passed the hypothesis test.
Keywords/Search Tags:Lasso, K-split, Split-and-conquer, Associated gene
PDF Full Text Request
Related items