Font Size: a A A

Research On Feature Selection Algorithms Using Information Granulation

Posted on:2017-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:J H LiuFull Text:PDF
GTID:2348330485456507Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Feature selection as a means of data preprocessing,which is becoming one of hot research topics in various fields,such as data mining,pattern recognition,and machine learning.Its main purpose is to remove irrelevant and redundant features,and find a feature subset that contains all or most of classified information compared with the original feature space.According to the idea of estimation,we can decompose a big granule into a family of small ones which have the similar distribution with the large one for analyzing data from multiple-level and/or multiple-view.Therefore,the characterization mechanism of information granulation is applied to feature selection,and a series of feature selection models based on information granulation are constructed.In this paper,we first introduce the related work of feature selection,and emphasize on discussing neighborhood granulation,large margin,and local subspace.Then,the study mainly focuses on the issue of removing redundant or irrelevant features from original feature space.With the point view of information granulation,including sample granulation and feature granulation,we carry on a series of work to verify our proposed methods.Specifically,the related research work of this dissertation is carried out in the following aspects:1)From the perspective of sample granulation,combining with the quality o f feature,we propose a feature selection algorithm based on quality of information(MCE).It defines the quality of feature based on information entropy,and employs large margin to induce the concept of nearest neighbor for granulating samples.Experiments are carried out to verify the model from compactness,and classification accuracy,respectively.Experiment results show that MCE can select a group of feature subset effectively.2)From the perspective of sample granulation,based on neighborhood relation,we propose a feature selection algorithm based on maximal nearest-neighbor rough approximation(MNNRS).This algorithm executes feature selection based on neighborhood rough set(NRS)as its core framework,employs margin to define the nearest-neighbor to granulate samples,and then corrects the method of calculating positive region.Experiments show that the MNNRS algorithm effectively preserves the advantage of the N RS algorithm,reduces the computational complexity,and improves the classification performance.3)From the perspective of feature granulation,according to high dimensionality of feature space and each label is supposed to possess specific features of its own in multi-label datasets,we propose a multi-label feature selection algorithm based on local subspace.The algorithm combines the model of local subspace with the information entropy theory to investigate the situation that some features are weakly related to the whole label set,but not to be abandoned.Experiment results show that the proposed algorithm can effectively deduce computation complexity,improve the classification accuracy,and enhance the flexibility of the feature selection strategy.4)From the perspective of sample granulation and feature granulation,according to high-dimensional and small-sample data exist the issues of high-dimensional and o ver-fitting,we propose a kind of heuristic local random feature selection algorithm.The algorithm employs the model of local subspace to granulate features and the neighborhood to granulate samples simultaneously,which can improve the classification accuracy,reduce computational cost,and solve the problem of over-fitting to some extent.
Keywords/Search Tags:feature selection, information granulation, large margin, local subspace, neighborhood rough set
PDF Full Text Request
Related items