Font Size: a A A

Research On Filter Feature Selection Algorithm

Posted on:2016-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2308330473457029Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development in information age, abundant industrial data have been emerged from various industries. These data are full of important industry information, however, they have huge volume with tens of thousands of samples and higher data dimensions. It is a big challenge for data mining. As one of significant approaches for data dimension reduction, feature selection can reduce data dimension, downsize data scale and improve data performance substantially. Compared with the state of the art of feature selection methods, filter feature selection is not only intuitively manipulative and easily understandable, but also well-performed in dimensionality reduction.In this dissertation, we have made some relative researches of filter feature selection for single-label and multi-label data. Our main works are as follows:(1) This dissertation has made a general overview of feature selection initially and then gave a respective introduction to its researching background and meaning. Meanwhile, the filter-based feature selection for single-label and multi-label data have also been elaborated in detail.(2) A group policy-based filter feature selection, namely the MRMRE (MRMR Extension) algorithm, has been proposed For the single-label data. It aims to figure out the problems such as the adequate consideration of relative algorithm for the correlation among feature sets. This algorithm based on mutual information achieves the relationship of characteristic attribute. Meanwhile, in terms of the relationship among related metrics of typical linear, the proposed algorithm is built on framework of the maximum-relevance-minimum-redundancy MRMR Algorithm, and makes further feature group ordering so as to acquire appropriate attributes subset. It can be concluded from the experiment that the MRMR Algorithm enjoys larger advantages on feature selection and the data stability and so on.(3) A filter feature selection algorithm for multi-label data, called the ML-MRMR (Multi-Label MRMR) Algorithm, has been proposed. In terms of the feature redundancy, the feature-label correlation and the importance degree inside label sets, this algorithm redefines the feature evaluation function for multi-label data. Furthermore, under the framework of the maximum-relevance-minimum-redundancy algorithm MRMR, the proposed algorithm fulfills the sort performance of all the feature attributes; meanwhile, it proposes two new feature selection criteria for sorting feature attributes and finally gains the optimal feature attribute subset. Experiments show that the result of selected feature attribute by the proposed algorithm is superior to the state of the art algorithms in different data evaluation criteria.
Keywords/Search Tags:Filter feature Selection, Group Policy, Canonical Correlation Analysis, Mutual Information, Maximum-Relevance Minimum-Redundancy
PDF Full Text Request
Related items