Research On Feature Selection Algorithms Under The Supervision Of Class Information

Posted on:2018-04-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Wang

Full Text:PDF

GTID:1368330596957931

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of big data,the sampled data sets keep exponential growths both in their amounts and in their dimensionalities.As a result,data processing and data analysis have to face tremendous challenges.High dimensionality brings out many problems,such as the decreased performance or even failure in practice for the classical learning algorithms,the extremely increased time and space complexities,and so forth.Therefore,feature selection has attracted more attentions in recent years due to its effectiveness in dimensionality reduction.Feature selection aims at selecting a discriminative optimal feature subset through a special individual feature or feature subset evaluation criterion and then constructing a reduced recognition space.There exist various feature selection approaches,which exploit different feature evaluation criteria and feature searching strategies,evaluate different kinds of features,and suit for different learning scenarios.These approaches have achieved promising performance in reducing dimensionality in practice.The existing feature selection approaches are categorized into various types and possess good selection performance,but there still exist three complicated problems hard to be solved:(1)redundancy information and new classification information cannot be balanced appropriately in feature evaluation.The existing feature selection approaches,especially the mutual information-based ones,focus either on alleviating the redundancy information of feature subset or on enhancing the new classification information of feature subset.None of these approaches achieve proper importance weights between these two information terms;(2)the partially predominant features are difficult t o b e s elected o ut.T he e xisting f eature s election a pproaches commonly employ univariate evaluation criteria and assign features with single evaluation scores.The partially predominant features,who are superior in discriminating a part of target classes or even only one class,are lowly evaluated and eliminated from the optimal feature subset;and(3)single-label and multi-label recognition tasks cannot be accomplished simultaneously.Single-label recognition tasks require the reduced space with little feature redundancy information,and multi-label recognition tasks need the reduced space with abundant class correlation information.None of the existing feature selection approaches can construct the reduced feature subspaces with these two characteristics and suit for these two recognition tasks simultaneously.Specific to the above three problems in the existing feature selection approaches,three effective feature selection models are designed and implemented on the basis of analyzing the supervision of different class information on feature selection processes in this paper.The three models are the feature selection model based on maximizing independent classification information,the feature selection model based on preserving class separability information,and the feature selection model based on preserving class correlation information,respectively.First,a new feature selection model on the basis of maximizing independent classification information is designed in this paper for tackling the improper balance problem of the redundancy information and new classification information for the existing feature selection approaches.The new model defines a conditional mutual informationbased term,which is denoted as independent classification i nformation.Independent classification information unifies both the new classification information that is provided by the candidate feature and the effective classification information that is preserved by the already selected feature.This mechanism facilitates properly balanced effects of redundancy information and new classification information on feature evaluation.In the new model,a max-relevance and max-independence criterion is proposed based on the independent classification i nformation t erm.T he n ew c riterion c an e ffectively select highly discriminative as well as lowly redundant features.Comprehensive experiments testify the excellent selection performance of the max-relevance and max-independence criterion on noisy instance data sets,noisy feature data sets,and low-dimensional and high-dimensional data sets.Second,a new feature selection model on the basis of preserving class separability information is proposed in this paper for solving the problem of ignoring the partially predominant features for the existing feature selection approaches.By analyzing classification bias and classification pe rformance,the new model measures the class separability preservation ability of feature to assess its classification performance to each target class.This strategy realizes the vectorization representation of the discriminative ability of feature,which helps easily find the partially predominant f eatures.Furthermore,a preserving class separability-based feature selection approach is proposed in the new model.The new approach endeavors to build an optimal feature subset that consists of the partially predominant features with minimal class-relevant redundancy.This kind of feature subset is suitable for constructing the reduced subspace with superior recognition performance.Comprehensive experiments demonstrates that the proposed feature selection approach achieves good selection results under various evaluation metrics.Last,specific to the difficulty of accomplishing single-label recognition task and multi-label recognition task simultaneously for the existing feature selection approaches,a new feature selection model based on preserving class correlation information is introduced in this paper.The new model quantitively measures the pairwise correlation information between the target classes and utilizes this information in supervising the feature selection process.The reduced subspace that is constructed by the new model maximally preserves the class correlation information,and furthermore obtains minimal class-relevant feature redundancy.Therefore,the new model is suitable for both single-label and multi-label recognition tasks.Moreover,an efficient algorithm for realizing the new model is implemented through the sparse multi-task learning technology,which conduces to a fast convergence rate of the new approach to optimal solutions.Experimental evaluations on the artificial data sets,single-label data sets,and multi-label data sets validate the nontrivial performance of the new feature selection approach.

Keywords/Search Tags:

feature selection, feature redundancy, independent classification information, class separability information, class correlation information

PDF Full Text Request

Related items

1	Research And Implementation Of Feature Selection In Chinese Text Classification
2	The Research On Feature Selection Methods For Text Classification
3	Research On Dynamic Feature Selection Algorithm Based On Mutual Information
4	Research On Enhanced Canonical Correlation Analysis With Applications
5	Research On Chi-square Statistic Feature Selection Method And TF-IDF Feature Weighting Method For Chinese Text Classification
6	Feature Selection Models And Methods Based On Information Measure For High Dimensional Data
7	Improved Feature Selection Methods For Web Pages Based On DIV Iterative Search And Information Gain
8	Robust Hierarchical Feature Reduction Based On The Class Relation
9	Research Of Feature Selection For Text Classification
10	Feature Selection Research Based On Maximum Relevance Minimum Redundancy