Feature selection is an important segment in machine learning and data mining,which aimed at finding a small feature subset to describe the whole data set.Feature selection is effective in removing irrelevant and redundant features,reducing dimensionality,avoiding dimension disaster and increasing classification accuracy.According to the number of labels that a sample contains,data classification model can be divided into single label classification and multi-label classification.Feature selection has been widely applied to study the classification of single label,and the research of single label feature selection algorithm has become increasingly perfect and mature.However,research in the field of multiple label are still less.Recently,more and more subject areas need to deal with multi-label issues,and the research in multi-label feature selection has become a hot topic.Multi-label classification is usually more complex than single label classification,so the difficulty of dealing with feature dimensionality and feature redundancy is greater.In a word,the study of multi-label feature selection algorithm has important theoretical significance and application value.This paper puts forward three methods of multi-label feature selection:(1)multilabel feature selection MLRF algorithm based on ReliefF;(2)MML-RF algorithm based on MLRF and mutual information;(3)MLRF-GA algorithm based on MLRF and modified genetic algorithm.In view of the traditional ReliefF algorithm cannot be applied to multi-label issues,this paper puts forward multi-label feature selection MLRF algorithm based on ReliefF.The MLRF algorithm defines the feature weight formula by improving the method of searching samples within the similar class and adding multi-label contribution parameters.Subsequently,it has been tested by the experimental on the three public multi-label data sets.Then,in view of the defect of MLRF algorithm that cannot remove features redundant,MML-RF algorithm introduces mutual information as the characteristics of redundant measurements,and uses the SBS(Sequence Backward Search)approach to remove redundancy.Experiments compared with the similar algorithm,and the results show that MML-RF algorithm can remove redundant features in the basis of maintaining performance of feature subset.Finally,the ML-GA algorithm improved on the genetic algorithm according to the characteristics of multi-label data.The MLRF-GA algorithm is a filter-package combination method of multi-label feature selection,by combination of the ML-GA and MLRF algorithm.The main idea of MLRF-GA is using the results of MLRF algorithm to guide the ML-GA.It can reduce the iterative algebra and computing time of ML-GA,and obtain the purpose of the near global optimal feature subset. |