Font Size: a A A

Research On Feature Selection For Pattern Classification

Posted on:2010-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:W Y SunFull Text:PDF
GTID:2178360278966961Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Feature selection plays an important role in data analysis and pre-processing steps. It can eliminate both irrelevant and redundant information, and reduce the dimension of training samples and complexity of algorithm and escap the noise disturbance. In the result, the generalization performance and classification precision of model would have been effectively improved. According to its principle, a feature selection process can be seen as a combinatorial optimization process: selecting a subset of features to optimize a certain evaluation criterion.Firstly, the feature selection frame include four steps: a generation procedure to generate the next candidate subset, an evaluation function to evaluate the subset under examination, a stopping criterion to decide when to stop and a validation procedure to check whether the subset is valid. Search strategies and evaluation functions are summarized based on the frame in the thesis.Secondly, several search strategies are studied in the paper, for example: the Branch&Bound algorithm, the Sequential selection algorithm, plus l takeaway r, the floating search algorithm. Based on the evaluation of inter-intra distance, all the performances of the search algorithms are compared on the same dataset, verified the theoretical analysis.Thirdly, the mutual information for feature selection is introduced in detail and its computation based on non-parametric density estimation is described. A mutual information based feature selection algorithm is implemented on several artificial and real datasets, its experiments results is analysed. At the same time, the mutual information criteria and other criteria were made a comparison.Finally, the paper studied the features of relevance and redundancy. According to the correlations between features and class labels and between features, a method for feature selection based on the correlation analysis is proposed, and it can greatly reduce the dimension of feature space, reducing the computational complexity.
Keywords/Search Tags:feature selection, search strategy, evaluation measure, mutual information, relevance
PDF Full Text Request
Related items