In the era of big data,researchers with interdisciplinary knowledge are dedicated to leveraging knowledge from various fields to enhance the performance and efficiency of machine learning algorithms.However,high-dimensional data,especially high-dimensional multi-label data with complex spatial structures,not only weaken the performance of machine learning algorithm models but also impose higher requirements on the optimization and computational capabilities of computer systems.To address these issues,this paper proposes a series of multi-label feature selection algorithm models tailored to such problems,aiming to improve algorithm performance,enhance optimization efficiency,and reduce storage costs of the system.In the environment of explosive growth in data volume,traditional methods face the dual constraints of low effectiveness and low efficiency when dealing with complex high-dimensional data.Multi-label data not only encompasses assorted issues associated with high-dimensional data but also involves binary attributes,multi-class attributes,and label-related multi-label attributes.The emergence of multi-label data poses challenges to algorithms in accurately extracting useful information and escalates computational costs for the system.Therefore,research on multi-label data becomes a challenging and popular research focus.As an integral component of machine learning,feature selection plays a vital role in extracting effective feature information from multi-label data and reducing storage costs of the system structure.The strategies for handling multi-label data with feature selection methods mainly include:mining feature subsets that provide the most effective information for all labels as the selected feature subset;utilizing Bayesian and Markov blanket theories to treat feature sets and label sets as causal variables for selecting feature subsets;selecting feature subsets based on the correlation between features and labels,and so on.The application of the first two strategies falls under the category of filter methods for multi-label feature selection.The multi-label feature selection approach that employs the last strategy can effectively explore the optimal feature subset while considering efficiency,as it leverages learning models.However,these methods still encounter some significant issues during the process of feature subset exploration.These issues not only undermine the performance of machine learning algorithm models but also reduce the optimization efficiency of computer systems and increase storage costs.To address these issues,this paper designs several targeted multi-label feature selection methods as follows:(1)Multi-label feature selection via robust flexible sparse regularization is proposed(RFSFS).Feature selection is the cornerstone for handling high-dimensional multi-label data,effectively boosting the subsequent model’s performance and reducing system storage overhead.Currently popular multi-label feature selection methods employ two strategies to acquire optimal feature subsets:sparsity and redundancy.Most methods employ a low-sparsity LASSO to obtain all class-related features,while others explore shared common features with high redundancy.The issues of low sparsity and high redundancy not only hinder feature selection algorithms from choosing high-quality features but also weaken the classification performance of subsequent models.To address the aforementioned challenges,we have designed a robust and flexible sparse regularizer and proposed the RFSFS method for multi-label feature selection based on this regularizer.Compared to previous multi-label feature selection methods,RFSFS holds two major advantages.First,RFSFS utilizes the inherent inner-product regularization property of the sparse regularizer,enabling it to capture class-related features with sparsity potentially higher than that of LASSO.Second,RFSFS can identify low-redundancy shared features through the designed sparse regularizer.(2)Robust sparse and low-redundant multi-label feature selection with dynamic local and global structure preservation is proposed(SLMDS).There are three issues with existing multi-label feature selection methods.First,existing methods either consider only local label correlation information or only global label correlation information.Second,existing methods utilize a constant Laplacian graph to uncover local label correlation information,leading to a decrease in subsequent model classification performance.Finally,traditional norm-based multi-label feature selection methods lack effective handling of feature redundancy.To overcome these challenges,SLMDS employs an improved graph structural model to maintain dynamic local label structure and global label structure.Simultaneously,the l2,1-norm and inner product regularization terms are applied to the objective function to preserve row-sparse robustness and select features with low redundancy.Lastly,all these components are integrated into the learning framework for unified optimization processing.(3)Label correlations variation for robust multi-label feature selection is proposed(LCVFS).The methods based on the original label space lack the processing of redundant and irrelevant label information,which can interfere with the algorithm’s feature selection.Additionally,existing methods have failed to dynamically coordinate second-order label association and high-order label association information.To address this,we propose a multi-label feature selection method called LCVFS that comprehensively considers both types of label correlation information.Firstly,LCVFS utilizes a self-representation model to uncover high-order label correlation information within the data.Then,by applying the l2,1-norm,it eliminates redundant and noisy information.Building upon this,LCVFS employs a label-level regularizer to capture precise second-order label correlation information.By utilizing the recorded label information flow and propagation mechanisms,it elucidates the feedback mechanism of different hierarchical label relationships and label classifications on features,thereby overcoming the constraints posed by the"curse of dimensionality"in high-dimensional complex data.(4)Robust multi-label feature selection with shared label enhancement is proposed(RLEFS).Traditional multi-label learning methods measure the correlation between features and labels using logical labels,which fails to accurately reflect the importance of corresponding labels.The RLEFS method addresses this issue.It reconstructs logical labels into numerical labels using the designed label enhancement term and applies the l2,1-norm to this label enhancement term,thereby obtaining a robust label enhancement.Furthermore,RLEFS utilizes the label enhancement to capture the underlying semantic structure between feature sets and label sets.Finally,leveraging local information from the data,RLEFS ensures the structural consistency between reconstructed labels and the original labels during the feature selection process.This paper conducts experiments using a large number of benchmark multi-label datasets.Based on the results and relevant experimental analysis obtained from different types of experiments,it can be concluded that the four proposed multi-label feature selection methods achieve significant improvements in classification performance across various evaluation criteria compared to existing benchmark methods,targeting different problems. |