With the advent of Big Data Era,mass data provide a large amount of information for applications in many fields,however,these data also have two distinctive characteristics,i.e.,the high dimensionality and the label incompleteness.These characteristics greatly increase the complexity of data analysis and process.If we directly train and learn such data,we will face the problems of "Curse of Dimensionality" and high complexity of the algorithm.The supervised multi-label feature selection algorithm is generally used in high quality,massive data environments with complete label space;the unsupervised multi-label feature selection algorithm often selects the relevant feature subsets directly by using the data itself,without considering the information contained in the label space.Therefore,weakly supervised multilabel feature selection becomes an effective method to tackle these issues.In recent years,although feature selection in weakly supervised multi-label scenarios has been widely used in recent years due to its wide range of applications,there are still many problems that need to be solved:(1)existing weak multi-label feature selection algorithms are generally vulnerable to the interference of missing labels and noise,making it difficult for the algorithm to accurately select important features;(2)for weakly labelled data,existing feature selection algorithms fail to take into account both the potential shared information between the feature space and the label space and the sparsity of the label space on the performance.To tackle the above two problems of current feature selection algorithms in processing weakly supervised multi-label data,we proposed two effective feature selection algorithms in this paper as follows:(1)Weak multi-label feature selection based on weakly supervised contrastive learning.It aims to select high-quality features in missing and noisy labeled data sets,while mining potential inter-class comparison patterns in a small amount of manually labeled data.The method includes three steps.Firstly,a weakly supervised pre-training strategy is designed to obtain the class attribute of each class label by using instance similarity and sparse learning method,which is used to recover missing labels.Secondly,the contrast learning strategy is introduced to capture the contrast mode of a small amount of labeled data to reduce the influence of noise data.Finally,ten multi-label datasets and four evaluation metrics are selected for experiments.Experimental results show that our approach outperforms the other state-of-the-art multi-label feature selection algorithms.(2)Weakly supervised multi-label feature selection based on shared subspace.The method consists of four steps.Firstly,the coupled matrix factorization is used to exploit the low-dimensional shared information between the feature matrix and the label matrix to reduce the effect of incomplete label information.Secondly,the non-negative matrix factorization is adopted to improve the interpretability for feature selection.Thirdly,the consistency assumption is used to recover missing labels and regularization terms to remove redundant or irrelevant features.In the end,experiments are performed on eight multi-label data sets in terms of five evaluation metrics.Experimental results show that our approach outperforms the other state-of-the-art multi-label feature selection algorithms. |