Font Size: a A A

The Research Of Feature Selection Techniques Based On Category Overlap Areas And Feature’s Effective Range

Posted on:2016-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X M WangFull Text:PDF
GTID:2308330461976518Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology, high dimensional data has been appeared constantly, which proposes higher demands and challenges for knowledge discovery. It is important to dig up the meaningful information from the massive amounts of data. And it is also the focus of attention in all walks of life. Feature selection is an effective method to reduce the data dimension. Selecting the distinguished and significant features from the high dimensional data can not only reduce the data dimensions and shorten the running time, but also can improve the classification performance and find the potential values.Data quality can affect the classification performance. Selecting the informative features can improve the quality of data and classification performance. A non-problem related feature contains little information and has little influence on the data distribution. The change of data distribution before and after the feature permutation, reflects the amount of information contained in the features. Based on permutation and category overlapping areas R-value, this paper proposes an ensemble unsupervised feature selection method, EUFSPR, which also combines clustering technique, ensemble technique and data evaluation technique..R-value is used to measure the ratio of overlap areas among categories. Clustering technique is adopted to get the sample groups, in order to discover the hidden data structure. Ensemble technique can improve the stability of feature selection. The test results of ten public datasets on clustering performance and classification performance show the effectiveness of the proposed method. It is a good data preprocessing method in the absence of class label information guidance. It can effectively improve the data quality and classification performance.Overlapping area of a feature among different category samples reflects its discriminative ability. The distinguished features can better separate samples of different categories, and make a small overlap area. This paper proposes an algorithm of forward feature selection and aggregation of classifiers based on the effective ranges of features and the distribution density of different category samples, FFS-ER. This algorithm establishes the corresponding single classifier for each feature. In the process of forward feature selection, the best classification performance and the minimum redundancy single classifier is chosen. The fusion classification model is established by the selected single classifiers through weighted voting. The classification accuracy rates of eight public datasets show that the performance of FFS-ER is better than FIM and SVM-RFE. And the comparison of the standard deviation shows the stability of this method.
Keywords/Search Tags:Feature Permutation, Ensemble Technique, Overlapping Area, Effective Range, Feature Selection
PDF Full Text Request
Related items