Font Size: a A A

Research On New Feature Selection Algorithm

Posted on:2012-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z H FengFull Text:PDF
GTID:2178330332491511Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Feature selection is one of the core issues in the area of pattern recognition and data mining. Feature selection can be divided into three categories as Filter model, Wrapper model and Embedded model according to the combination of different classifiers. Filter model has its own advantage as good commonality and comparatively lower time-cost, so Filter model is the most popular model in the research of feature selection in these years. But in Filter model, the process of feature selection is independent of the process of classification. Selected feature-subset has offset to the classifier needs. The error-rate of classifier will not satisfy, and because of its lack of consideration of feature-relevance index the final subset exists feature redundancy. So we need put forward a new improve algorithm to lower error-rate of Filter model and eliminate feature redundancy, finally obtain a best subset which not only fit the classifier needs but also meet the need of feature-relevance index. This article mainly contains three parts based on the ideas:1.Trace ratio criterion algorithm is a classical Filter model algorithm. It uses a new iterative procedure to directly optimize the subset-level score finding the global optimal feature subset. Trace ratio can obtain a relatively good subset in a short time. But because of shortcoming of Filter model, accurate-rate based on this model is not satisfied. In order to achieve a better accurate classification rate, a new feature selection algorithm is proposed which combines trace ratio criterion and error region selection, and then add some new features which can correct error-sample once in a time. Finally the error-rate of classifier can be decreased efficiently. We propose two methods in Error region selection, one is SFS and the other one is +L-R selection.2.As for the feature-redundancy, this article analyses feature-feature relevance and feature-class relevance index, based on the research of relevant feature selection, we propose a new improve algorithm. Use KNN selection to improve Trace Ratio algorithm, we focus on the feature-feature relevance to eliminate feature redundancy, lower the dimension of subset while classifier still keep high accurate-rate. We use UCI databases to proof our algorithm available. The databases we chose are ORL, wine and Australian.3.Article also does a lot of research on Relief algorithm and its extend algorithm ReliefF. Relief and ReliefF are classical Filter model feature selection, classification rate based on this model is not satisfied, we use error region feature selection to improve Relief and ReliefF algorithm in the first place, lower error-rate of classifier efficiently. Relief and ReliefF algorithm focus on feature-class relevant, so feature subset produced by Relief and ReliefF are efficient for classifying, but the feature subset may have redundancy. So then we use KNN method based on feature-feature relevant index to improve Relief and ReliefF algorithm. New improve algorithm lower the dimension of subset efficiently while classifier still have a good perform. Our experiments proof the algorithm available by UCI databases.
Keywords/Search Tags:Feature Selection, Trace Ratio Criterion, Feature Relevance, KNN feature selection, Relief feature selection
PDF Full Text Request
Related items