Classification is one of the hotspots in data mining technology research.Multi-label classification is proposed for more and more multi-label data,and has been widely used in the detection of gene function,automatic labeling of multimedia content and other fields.The multi-label feature selection algorithm is to eliminate the large number of redundant and irrelevant features in the classification task,so as to reduce the number of features and improve the performance of the classifier in multi-label classification.In this thesis,we design two feature selection algorithms,FI-ARML algorithm and Mult-ReliefF algorithm,from different angles.We propose a feature selection algorithm based on frequent itemsets named FI-ARML algorithm,which can find the association between attributes in data according to association rules.The algorithm improves the multi-label feature selection algorithm based on neighborhood rough set.It is divided into four steps: the first,constructing frequent k-itemsets based on class labels;second,dividing the training samples according to the label set;third,calculating the subset of features of each sub-sample;fourth,all the feature subsets are combined to obtain the final feature set.Experiments show that FI-ARML algorithm can greatly improve the speed of feature selection and shorten the time when the classification effect is equivalent.To solve the problem that ReliefF algorithm is limited to single label data,a multi-label feature selection algorithm named Mult-ReliefF is proposed.The Mult-ReliefF algorithm redefines the in-the-class nearest neighbor and out-of-class nearest neighbor search methods and updates the feature weight formula by adding the contribution value of the label.Experiment shows that Mult-ReliefF algorithm can improve the classification accuracy and obtain better feature subsets. |