Font Size: a A A

Research On Label Distribution Learning Algorithm Based On Data Correlation Mining

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:T T RenFull Text:PDF
GTID:2518306512487374Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,how to solve the problem of label ambiguity has become a hot topic in the field of machine learning and data mining.There are mainly two existing learning paradigms for addressing label ambiguity problem in the traditional machine learning framework,namely single-label learning and multi-label learning.The former assumes that an instance is associated with a single label only,while the latter assumes that an instance may have multiple labels.Obviously,compared with single-label learning,multi-label learning can address more label ambiguity problems.However,both can only solve the issue of "Which labels are related to the instance".They cannot answer the question of "How much does each label describe the instance".In order to solve this problem,the label distribution learning paradigm is proposed.Label distribution learning can directly learn the importance of each label to a specific instance,and it is suitable for solving more complex label ambiguity problems.In addition,it is well known that the key of label distribution learning is to mine and use the correlation in data.Based on this,this thesis conducts further research on label distribution learning based on the sample correlations and label correlations.Firstly,this thesis proposes a label distribution learning algorithm which exploits the global and the local label correlations simultaneously.Current methods either consider the global label correlation or the local label correlation only to improve the performance of the algorithm.In reality,the correlation among labels is complex.A more reasonable way is to consider both the global and the local label correlations.Specifically,a label can be represented by a linear combination of its related labels,which implies the low-rank structure of the label distribution matrix.Thus,this thesis learns the global label correlation through a low-rank approximation.Besides,in order to capture the local label correlation,this thesis clusters the samples into different groups.For the samples in the same group,they share the similar correlations.For the samples of different groups,the label correlations may be different.Experimental results show that using both global and local correlations among labels can improve the performance of the algorithm better.Secondly,this thesis proposes a label distribution learning algorithm based on specific feature selection and common feature selection.There is a common problem in existing methods,which assumes that all features are shared by all labels.This assumption is not strict enough.In fact,some labels may be determined only by a part of features.To describe the relationship between features and labels more accurately,this thesis learns the relevant features for each label.Besides,this thesis learns the common features for all labels to avoid the loss of common information.Furthermore,the label correlation is exploited to improve the performance of the algorithm.Experimental results demonstrate that our proposed methods can better learn the relationship between features and labels,and perform remarkable better than the state-of-the-art methods.Thirdly,this paper proposes a weakly supervised label distribution learning algorithm based on transductive matrix completion with sample correlations.Most of the existing algorithms are proposed for the data with strong supervision information.However,the real world data are incomplete,and they may be inaccurately labeled.Therefore,it is necessary to propose a weakly supervised method.Specifically,it is worth noting that the information extracted from the test data is helpful when label distributions are missing.In view of this,the thesis utilizes the matrix completion technique to introduce the distribution information of test data.Moreover,this thesis uses manifold regularization to learn the sample correlation.In the absence of supervised information,the useness of sample correlation can improve the accuracy of the algorithm to a certain extent.The experimental results on multiple real datasets verify the effectiveness of the proposed algorithm.
Keywords/Search Tags:Label distribution learning, weakly supervised learning, label correlations, sample correlations, feature selection
PDF Full Text Request
Related items