Research On Feature Selection Algorithm For High-dimensional Imbalanced Class Data

Posted on:2018-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:G Q Wang

Full Text:PDF

GTID:2428330566998750

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,high-dimensional and imbalanced data with both highdimensional and class imbalance problems are becoming more and more emerging in some new fields such as bioinformatics and satellite images.Its complex characteristics pose a serious challenge to data mining research.The class imbalance problem is that when the number of samples in the dataset varies greatly in different categories,the classifier trained is more biased to the majority class,while ignoring the minority class samples which contain important information.High dimensional problem is due to the high dimension of the feature space,the complexity of the classifier,and the overfitting problem,which leads to poor classification results.In the process of high dimensional data preprocessing,it is important to select the low dimensional feature subset which is highly related to the classification target and with minimum redundancy,so as to improve the learning efficiency and classification accuracy.However,in the data which exists class imbalance situation at the same time,traditional feature selection methods tend to choose the feature subset which is beneficial to majority class,which leads to a poor performance in the classification of minority class samples.We firstly introduce the traditional wrapper method SVM-RFE feature selection algorithm which is based on support vector machine and analyze its problems in the face of imbalanced class data,and put forward the improved SSVM-RFE algorithm based on structural support vector machine which optimizes F-measure instead of accuracy to take class imbalance into account.Due to the feature ranking method based on SVM weights can only reflect the correlation between features and class labels,but it can not solve the redundancy problem between features.Therefore,after deleting a large number of irrelevant features using SSVM-RFE algorithm,we construct a series of balanced subset based on class decomposition framework,and the Hilbert Schmidt independence criterion(HSIC)is used to measure the unbiased correlation between features on these balanced subsets.After that,an improved approximate Markov blanket feature selection method(CBMBFS)for feature combination problem is proposed to remove the redundant features.The two-stage feature selection method SSVM-RFE-CBMBFS is proposed in this paper,considering the unbalanced data distribution,can select a set of features that have high distinguishing ability and minimum redundancy between features.Subsequently,a series of experiments are carried out,and a variety of unbalanced data classification performance criterion were used to evaluate the classification results of the algorithm and compared them with the latest algorithm to prove the effectiveness of our proposed algorithm.

Keywords/Search Tags:

feature selection, imbalanced data, structural SVM, F-measure optimization, markov blanket

PDF Full Text Request

Related items

1	Research On Feature Selection Technology Based On Markov Blanket Representative Set
2	Research On Feature Selection Methods Of Imbalanced Date
3	Research On Confidence Measure Of Speech Recognition
4	Feature Selection And Classification For Imbalanced Medical Data
5	Research On The Algorithm Of Discovery Of Markov Blanket Based On Logistic Regression And Its Application
6	Research On Feature Selection Algorithm On Imbalanced Data Classification
7	Research On Feature Selection And Classification Algorithms Based On Information Theory
8	Local structured learning for feature selection and causal discovery
9	Improved Methods Of Oversampling And Feature Selection Based On Imbalanced Data
10	Research On IPTV User's Complaint Prediction Strategy Based On Imbalanced Data Processing