Font Size: a A A

The Research Of Colorectal Cancer Prediction Model Based On Feature Selection And Ensemble Learning

Posted on:2018-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2334330536973566Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Colorectal cancer(CRC)is one of the most common and dangerous malignant tumors in the world,and its high incidence areas are mainly concentrated in the western developed countries,such as Europe,America,New Zealand,Australia and so on.However,with the improvement of people’s living standard,their life style and dietary structure become more and more westernized,the rates of CRC are increasing rapidly in China,which not only seriously threatens the human health but also has an impact on people’s quality of life.It is well know that CRC is harmful to our health,but the etiology and pathogenesis of CRC is still not fully understood as yet.Until now,a large number of epidemiological studies have showed that the occurrence of CRC is a complicated process,in this process,it is not only affected by environmental factors,genetic factors and other factors unilaterally but also by their interaction.However,what factors that affect the occurrence of CRC is still not clear.Therefore,the main work of this paper is to study the role of environment-dietary factors,genetic susceptibility and their interaction in colorectal cancer risk.In this paper,based on the CRC data which is provided by the Institute of toxicology,Third Military Medical University,a prediction model of CRC is established,it can provide a reliable basis for early diagnosis and prediction of CRC.The main work of this paper includes:1.This paper proposes a multi-aspect feature selection method.Because of the high dimensions of CRC data,it is necessary to take effective measures on data dimensionality reduction.Here we combine the relief algorithm and correlation analysis to reduce the dimension of CRC data.First,calculate the sample feature weights by relief algorithm;then remove the features with small weight and retain the features with large weight.After the implementation of the relief algorithm,we get a feature subset which is benefit to classification.However,it is unknown that whether the feature subset are redundant,thus we do the correlation analysis to feature subset.For features with a high degree of correlation,retain the one with large weight as optimal features,remove the one with small weight;features with weak correlation are all considered as optimal features.So,through the above two methods,we get the optimal feature subset.2.Based on the Adaboost algorithm,we propose a Hybrid Ensemble Learning Method(HELM).In order to improve the generalization ability of Adaboost algorithm,we do some research on the basic classifier and propose the HELM model.HELM combines homomorphic and heterogeneous ensemble.First,for different type weak classifiers,train and get different Adaboost classifiers.Then we assign these Adaboost classifiers different weight according to theirs classification accuracy.Finally these Adaboost classifiers are fused by weight to form the HELM model.3.Establish the CRC cancer prediction model.The model includes four parts:(1)Data collection and preprocessing.It can be divided into two steps.First,clean the data,which includes processing missing values,noise values and abnormal values.Second,classify the CRC data into four categories(SNPs,food,lifestyle,demographic)from the biological point of view.(2)Feature selection.Here we use relief feature selection and correlation analysis to get the optimal features.(3)Classification prediction.Here we use the HELM algorithm to classify the data.(4)Comparative analysis.To verify the HELM algorithm,we compare it with the related algorithms.This work is to investigate the mechanism of colorectal cancer(CRC)by proposing a robust CRC cancer predictive model.By experimental study,it can be find that the CRC predictive model is stable and efficient.It can be widely used in the study of complex diseases.
Keywords/Search Tags:colorectal cancer, feature selection, ensemble learning, HELM algorithm
PDF Full Text Request
Related items