Font Size: a A A

Research On Acquisition And Application Of Label Correlation In Multi-label Learning

Posted on:2022-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y CheFull Text:PDF
GTID:1488306338975939Subject:System analysis, operations and control
Abstract/Summary:PDF Full Text Request
With the in-depth development of artificial intelligence and the rapid advancement of technology,traditional supervised learning has obviously been unable to meet the increasingly complex learning problems and divergence of data.In real life scenarios,many learning tasks involve training and predicting multiple output variables based on a common set of input variables.The recent emergence of multi-label learning provides a solution to the issues.The rapid expansion of output space leads to the following challenges:on the one hand,limited by storage space and time overhead,it is unrealistic to train a learner for each possible subset of label variables;on the other hand,the proliferation of label variables leads to an increase in the cost of acquiring labeled data,resulting in a serious lack of multi-label instances with responses.To improve the prediction accuracy,efficiency and generalization performance of multi-label learning algorithms,acquisition and application of label correlation is the core breakthrough,and also the biggest challenge.However,the existing methods on label correlation always relay on external knowledge,or estimate the relationships using the co-occurrence and mutual exclusion frequency of labels in output space.Obviously,the extensive research only considers the interaction between the label variables,and fails to provide a complete and convincing theoretical framework to quantitatively measure the influence of feature variables on label variable and the label correlation.Meanwhile,they also failed to propose an effective solution for how to use label correlation.In order to explore the acquisition and application of label correlation in multi-label data,we provide the following solutions and research results:(1)For multi-label data with discrete input space,the essential elements of any label are extracted.Based on the essential elements of different labels,the relevance judgement matrix is proposed to characterize the influence of features in input space on label and label correlation.Furthermore,the labels in output space are divided into several relevant label groups.To maintain the discernibility ability of original input space to each relevant label group unchanged,a multi-label feature selection method CLSF is designed.For labels with strong correlations,the method CLSF can delete redundant or interfering features,and extract descriptive features,so as to achieve the purpose of two-way dimensionality reduction of input space and output space.(2)On the one hand,in order to avoid losing the discriminative information contained in feature variables when discrete input data,we aim to completely retaining all the information in datasets;one the other hand,to reduce the computational complexity of essential element-based label correlation,we try to construct a more reasonable metric to characterize the binary importance of feature on label.For multi-label data with numerical features,the crucial features of each local label class are selected.By considering the overlap of crucial features for different local label classes,the local label correlation and global inter-label relevance matrix are constructed.According to different judgment parameter ?,the labels can be transformed into several independent label-related subsets.For each label-related subset,a local scoring function,which can highlight the local characteristics of labels,is designed to integrate the local classes.Finally,a multi-label local feature selection method LRFS-?,which performs more targeted local feature selection on any label-related subset,is established.Thus,the learning and prediction performance of multi-label classification is effectively improved.(3)In order to avoid the loss of information caused by extracting the binary importance of feature to label,we are committed to measure the feature distribution on label;in addition,in order to reduce the loss of correlation information caused by fitting the strongly correlated labels,the inter-label relevance matrix is directly applied in the predictive learner.The feature distribution on label can accurately quantify the discriminating effectiveness of all feature variables contained in input space to any label variable.By combining two different aggregation strategies,the formal concept and measurement function of feature distribution-based label correlation are achieved.Furthermore,the label relevance matrix can objectively reflect which label variables in output space have strong correlation,and which labels have weak or even no correlations.To adjust the distance between the parameters for different labels,the FL-MLC method is proposed according to label correlation.(4)To solve the shortage of labeled multi-output data and data heterogeneity,label correlation is promoted to solve more complex and practical scenarios,that is,a semi-supervised problem with multi-output regression learning tasks.For different output variables,fuzzy rules are extracted first in the auxiliary domain(i.e.,source domain)to describe the shared characteristics of different outputs and capture their uniqueness.For a homogeneous scenario,based on the resemblance and divergence between auxiliary data and current data(i.e.,target domain),the proposed method FMOT can modified and transferred the fuzzy rules to improve the performance of the new but similar regression tasks in the target domain.On this basis,we handle a more complex heterogeneous scenario by learning a latent input space to reduce the disagreement of variables between domains.For different types of complex data with multiple outputs,this topic has established a relatively complete framework to construct the importance of feature to label.Furthermore,feature importance-based label relevance matrix is applied to multi-label feature selection and classification.Finally,the label correlation is extended and applied to real application scenarios,that is,multi-output regression transfer learning task.Compared with the existing multi-label classification methods,multi-label feature selection methods,multi-output regression methods and transfer learning methods,the proposed algorithms have achieved good experimental results on multiple real datasets.
Keywords/Search Tags:Label correlation, multi-label classification, feature selection, multi-output regression, transfer learning
PDF Full Text Request
Related items