Font Size: a A A

Feature Selection And Classification For Imbalanced Medical Data

Posted on:2019-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2348330545993347Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
For the classification problem of imbalanced medical data,due to the disparity in the sample number of various categories,the minority class samples are easy to be misclassified.Therefore,the models established by traditional classification algorithms generally fail to have satisfied classification performance.From the perspective of data dimension and feature attribute,high-dimensional continuous imbalanced data contains a large number of irrelevant and redundant features,which easily leads to the curse of dimensionality and over-fitting.However,some low-dimensional discrete imbalanced data have the problem of weak correlation between features and categories.To enhance model performance,feature selection is studied to deal with the imbalanced classification problem.The main research contents of this paper include:1.For high-dimensional continuous imbalanced gene expression data,this paper proposes an imbalanced feature selection method based on modified ReliefF and support vector machine recursive feature elimination(SVM-RFE).First,the modified ReliefF algorithm is applied to remove irrelevant features.Then,SVM-RFE algorithm is further used to search the best feature subset.Experimental results from Kent Ridge biomedical dataset verify the effectiveness of our proposed method in comparsion with All-SVM,ReliefF-SVM,ReliefF_M-SVM,and.SVM-RFE models.2.For high-dimensional continuous imbalanced gene expression data,this paper proposes an imbalanced feature selection method based on multi-objective optimization(MOP),With SVM as the classification model,we firstly use the modified ReliefF algorithm for preliminary feature screening.Then,feature selection is described as a MOP problem.NSGA-? algorithm is utilized to optimize the model performance and the size of feature subset simultaneously.In this way,a series of Pareto optimal feature subsets are obtained.Examples from Kent Ridge Biomedical Dataset validate the good performance of the proposed method.Comparsion with ReliefF_M-All-SVM and ReliefF_M+SVM-RFE models are also presented.3.Owning to the severe imbalance and large feature sparseness of Cerebral Hyperperfusion Syndrome(CHS)dataset after Carotid Endarterectomy(CEA),it is difficult to establish a SVM classification model to successfully identify the minority class.For this type of problem,this paper fu rther proposes the use of chi-square statistics for feature selection,followed by anomaly detection and cost-sensitive methods at the algorithm level and resampling techniques with integration techniques at the data level.The experimental results show that the proposed method achieves good performance in dealing with CHS data set.
Keywords/Search Tags:imbalanced classification, feature selection, support vector machine, recursive feature elimination, ReliefF algorithm, multi-objective optimization, nondominated sorting genetic algorithm
PDF Full Text Request
Related items