Font Size: a A A

Feature Selection And Multi Classification On The Cancer Genome Atlas

Posted on:2022-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:M X LiFull Text:PDF
GTID:2480306782977459Subject:Oncology
Abstract/Summary:PDF Full Text Request
Nowadays,people's attention on disease,especially cancer prevention and detection,are increasing with the increasing desire of healthy.Owe to the development of technology,the Cancer Genome Atlas Project collects the gene data of various cancers,hoping to find the approaches of cancer detection upon the gene data.As an essential step to deal with high-dimensional data,feature selection has a great influence on subsequent variable interpretation and model fitting.In this thesis,data from the cancer gene atlas were selected to classify the patients from five kinds of cancer,i.e.breast cancer,colon adenocarcinoma,kidney renal clear cell carcinoma,lung adenocarcinoma and prostate adenocarcinoma.This thesis applicated the random forest algorithm to score all the feature,according to the different characteristics of the cumulative contribution rate,we get 19 different feature subset.Then,on each feature subset,the redundant feature deletion algorithm based on dual neighborhood(ERFTN)is applied to delete the redundant feature.Finally,the variable selection algorithm for clustering and classification(VSCC)is applied to eliminate the multicollinearity among the features.Through the above three steps,the original data has been processes for three consecutive dimensionality reductions,and purified 19 selected feature subsets that can be applied to the classification model are obtained.Then,we select the optimal subset and optimal model based on the indicators obtained from different feature subsets on different multi-classification models.Finally,three different recommendation models are given according to different specificity and sensitivity index levels,and the key genes for identifying each cancer are given for users to select and monitor.
Keywords/Search Tags:Cancer Genome Atlas Data, Feature Selection, Algorithm for Deleting Redundant Features Based on Dual Neighborhood, VSCC Algorithm, Multi Classification
PDF Full Text Request
Related items