Font Size: a A A

The Feature Selection Algorithms Based On Category Overlapping Ratio And Feature's Overlapping Area

Posted on:2017-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:M FanFull Text:PDF
GTID:2348330488958748Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Big data result from the technique development in the information-rich era. In contrast to the dimensionality of data, identification of the valuable information from data by low-cost situation is more important to facilitate the society development. The development of data mining techniques has become an increasing social attention. Feature selection is a main data analysis technique in data mining. How to filter out the features which have discriminative ability and important value from the high-dimensional data is critical. Feature selection technique has been widely used in many applications, such as intrusion detection, biomedicine and environmental science.SVM-RFE (Support Vector Machine-Recursive Feature Elimination) is a common feature selection method, it ranks the feature according to the deleting sequence. This paper studies the evaluation criteria to select the discriminative feature subset in the iterative deletion process of SVM-RFE. In order to more accurately evaluate a feature subset and select the most discriminative feature subset, this paper uses accuracy and the degree of category overlap as a comprehensive evaluation to evaluate the feature subset during the backward iterative feature selection process. If a feature subset has a better discriminative ability, it should have higher accuracy rate and lower degree of category overlap. So this paper proposes a feature selection algorithm SVM-RFE-COA. In addition, SVM builds a model based on the current feature set and the training samples during the SVM-RFE feature selection process, and the quality of training samples influences the calculation of feature weights. If the training samples had high degree of category overlap under the current feature space, overfitting might occur. So SVM-RFE-COA temporarily shields the samples which have higher degree of category overlap than the degree of category overlap on original feature space during the backward iterative process, this may be useful to select more discriminative feature subset. Thus this paper presents a feature selection method M-SVM-RFE-COA based on SVM-RFE-COA. The results on the 11 public data sets show that SVM-RFE-COA uses the comprehensive evaluation can select more discriminative feature subset than SVM-RFE, and M-SVM-RFE-COA temporarily shields those samples which have higher degree of category overlap than on original feature space during the backward iterative process can improve the performance of SVM-RFE-COA too.ERGS is a feature selection algorithm based on effective ranges of features. It calculates the feature's overlapping area between every two kinds of categories, and uses this to evaluate the discriminative ability of every feature. The larger overlapping area a feature has, the weaker discriminative ability the feature has. However ERGS does not consider the proportion occupied by every two categories'overlapping area on each category's effective range, this may influence the calculation of the feature's discriminative ability. So this paper proposes an improved algorithm MERGS based on ERGS. For each feature, MERGS computes the proportion which occupied by every two categories'overlapping area on each category's effective range to evaluate the feature's overlapping area based on effective range; MERGS calculates the overall training sample's overlap in this feature based on the proportion of heterogeneous sample that occupy each sample's nearest neighbors. The results on the eight public data show that the performance of MERGS is better than ERGS algorithm, and the performance comparison between MERGS and ERGS on the liver disease serum data shows that MERGS is better than ERGS algorithm too.
Keywords/Search Tags:Data Mining, Feature Selection, Category Overlapping Ratio, Feature's Overlapping Area, SVM-RFE
PDF Full Text Request
Related items