Font Size: a A A

Research On Multi-label Feature Selection Algorithms Based On Random Search Strategy

Posted on:2017-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2308330488997606Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The classification problem in the field of pattern recognition is to train a classification model on known instances and predict unknown instances. Accroding to the number of lables, classification can be divided into two categories:single-label classification and multi-label classification. In traditional single-label classification, each instance is only associated with one label. Unlike single-label classification, instances in multi-lable classification usually have multiple labels simultaneously.Due to the high dimensions of multi-label data sets, many features are redundant and irrelevant for a given classification task, and high dimensional data may increase the difficulty to produce classisifier and computation complexity, and lead to poor performance of classification. To ease this problem, feature selection techniques should be employed to reduce the high-dimensionality of multi-label data by removing redundant and irrelevant features. Feature selection can produce a more compact classification model with better generalization and enhance the performance of classification.In this thesis, we propose two multi-label feature selection methods which use the filter approach and random search strategy:(1) a multi-label feature selection method based on correlation and genetic algorithm (CFS-GA); (2) a multi-label feature selection method based on high-order mutual information and particle swarm optimization (HMI-PSO).CFS-GA uses a correlation and information gain based heuristic evaluation criterion to measure the feature-feature redundancy and the feature-label relevance simultaneously. The optimizaition method genetic algorithm is used to find the global optimal feature subset which meets this ctiterion. In experiments, we selected three multi-label feature selection methods:CFS-SFS, CFS-SBS and ReliefF-ML, which will be compared with our CFS-GA on twelve benchmark data sets. Our experiments demonstrate that our proposed method CFS-GA can find the global optimal feature subset efficiently and the classification performance can be significantly improved.HMI-PSO adopts a heuristic evaluation criterion based on high-order mutual information to evaluate dependencies between features and label combinations. The optimizaition method binary particle swarm optimization algorithm is used to search the global optimal feature subset which meets the ctiterion. In experiments, we compared our HMI-PSO with three multi-label feature selection methods:PPT-CHI, PMU and FIMF on seven benchmark data sets. Our experiments show that our proposed method HMI-PSO performs better than other methods.
Keywords/Search Tags:multi-label classification, multi-label feature selection, correlation-based feature selection, high-order mutual information, random search strategy, genetic algorithm, particle swarm optimization
PDF Full Text Request
Related items