Font Size: a A A

Streaming Feature Selection Algorithm Research For Multi-label Classification

Posted on:2019-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:X Z GuoFull Text:PDF
GTID:2428330548995253Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In multi-label classification problems,a single instance can belong to several classes at the same time and thus the classes are not mutually exclusive.In traditional multi-label feature selection problems,the feature space,the number of samples and the label space are known,but in multi-label streaming feature selection problems,the number of samples and the label space are fixed,but the full feature space is unknown in advance and the candidate features are generated dynamically and sequentially arrive one at a time.Streaming feature selection is to select a best so far subset of features from the candidate features which are generated dynamically and sequentially arrive one at a time.By removing irrelevant and redundant features,streaming feature selection can reduce the dimensionality of the data,speed up the learning process,simplify the learned model,and increase the performance.In this paper,we propose two types of multi-label streaming feature selection algorithms:based on alpha investing method and based on mutual information,respectively.The first type of algorithm includes two methods which are Multi-Label Streaming Feature Selection using Alpha-Investing based on Binary Relevance(MLSFSAI-BR)and Multi-Label Streaming Feature Selection using Alpha-Investing based on Multi-Output Regression(MLSFSAI-MOR).We perform extensive experimental comparison of our algorithms and other three multi-label feature selection methods on eleven datasets.Experimental results demonstrate that MLSFSAI-BR performs better than three comparison algorithms.Compared with MLSFSAI-BR,MLSFSAI-MOR can filter out fewer features faster and can perform preliminary mining of the correlation between labels.The second type of algorithm includes three methods which are Multi-Label Streaming Feature Selection based on Mutual Information(MLSFS-MI),Fast Multi-Label Streaming Feature Selection based on Mutual Information(MLSFS-Fast)and Multi-Label Streaming Feature Selection based on Max-Relevance and Min-Redundancy(MLSFS-MRMR).Experiments show that in these three algorithms,MLSFS-MRMR has the best performance and the least number of feature selections.It not only considers the correlation between streaming feature and labels,but also considers the redundancy between features.Compared with the two algorithms proposed in the first type and the three comparison algorithms,the MLSFS-MRMR algorithm performs better.
Keywords/Search Tags:multi-label classification, multi-label streaming feature selection, binary relevance, multi-output regression, max-relevance and min-redundancy
PDF Full Text Request
Related items