Font Size: a A A

Research And Applications Of Multi-label Learning In Noisy Environment

Posted on:2022-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:T T YuFull Text:PDF
GTID:2518306734454524Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multi-label learning is a fundamental and important task in supervised learning.With the rapid development of machine learning and deep learning,multi-label learning has been successfully applied in information retrieval ?recommendation system and protein function prediction and so on.Despite great progress has been made in the current research on multi-label learning,the current research on multi-label learning is often based on the strong supervision assumption that each training sample is totally labeled correctly.However,the above assumption is unrealistic in the real world as it is quite time-consuming and expensive to collect training samples that are completely correctly annotated,what's more,the collecting procedure is affected by data characteristics,differences of annotators,and external environment etc.In order to better solve the above practical problems,this paper studies the problem of multi-label learning in noisy enviroment.In recent years,partial multi-label learning(PML)is the main paradigm to solve multi-label learning with noise.In PML,the label set of each training sample is composed of the ground-truth labels and noisy labels.Existing PML methods mainly focus on how to reduce the impact of noisy labels to the performance of multi-label classifier.To achieve this goal,some methods aim to identify the ground-truth labels of each training sample,while other methods model the PML problem mainly through low-rank and sparse assumptions.However,the current PML methods have the following problems:(1)The existing methods ignore the negative information between features and labels;(2)The existing algorithms generally suffer from the high computational cost when learning with large label spaces or even can not handle PML problem with large label spaces;(3)The assumption of PML that each training sample must contain noisy labels is still too strong,as some of the training samples collected from the real-world may not contain noisy labels.In view of the shortcomings of the existing work,this paper studies the problem of multi-label learning with noisy from the following three aspects:(1)Existing PML methods ignore the negative information between features and labels.Specifically,if two instances have largely overlapped candidate labels,irrespective of their feature similarity,their ground-truth labels should be similar;while if they are dissimilar in the feature and candidate label space,their ground-truth labels should be dissimilar with each other.To solve this problem,we proposed a novel PML method called partial multi-label learning with label and feature collaboration(PMLLFC).PML-LFC estimates the confidence values of relevant labels for each training sample using the similarity of both the label and feature space,and trains the desired predictor with the estimated confidence values.PML-LFC achieves the predictor and the estimation of latent label confidence matrix in a reciprocal reinforcement manner by a unified model,and develops an alternative optimization procedure to optimize them.A large number of experimental results show that PML-LFC can make good use of sample feature information and labeling information to estimate the labeling confidence matrix and can learn a robust multi-label classifier to improve the classification performance.(2)When the label space is too large,existing PML methods generally have the problem of high computational cost or even can not handle such situation.In order to solve this problem,this paper proposes a PML method called partial multi-label learning using label compression(PML-LCom).PML-LCom firstly splits the observed label matrix into a latent relevant label matrix and an irrelevant one,and then factorizes the relevant label matrix into the product of two low-rank matrices,one encodes the compressed labels of samples,and the other explores the underlying label correlations.Next,it optimizes the coefficient matrix of the multi-label predictor with respect to the compressed label matrix.In addition,it regularizes the compressed label matrix with respect to the feature similarity of samples,and optimizes the label matrix and predictor in a coherent manner.Experimental results on both semi-synthetic and real-world PML datasets show that label compression improves both the effectiveness and efficiency,and PML-LCom achieves a performance superior to the state-of-the-art solutions on predicting the labels of unlabeled samples with a large label space.(3)To solve the problem that the assumption of each training sample must contain noisy labels in PML is still not quite consistent with the real-world,this paper proposes a novel method called multi-label text classification with label correction under noise(LCN).Different from PML that assumes the labels of each training sample are given by a candidate label set,LCN assumes that an irrelevant label of each training sample has a probability to be transformed into a candidate label,namely noisy label.LCN combines label correction and multi-label classifier learning through an end-to-end manner.The LCN contains two modules: a label correction module and a classification module.In the label correction module,a group of prototypes for each class is learnt with the help of label semantic and feature information.These prototypes are then used to calculate the similarity between the extracted deep features to correct the labels of each training sample.In the classification module,the classifier combines the original noisy labels and the corrected labels of each sample as supervised information to guide the training procedure.The two modules are combined in a unified framework and trained through an alternative manner.Extensive experimental results on two multi-label text benchmark datasets show that LCN can effectively reduce the impact of noisy labels on the performance of the classifier,demonstrating its advantages over the state-of-art methods.
Keywords/Search Tags:Weakly supervised learning, Multi-label learning, Noisy labels
PDF Full Text Request
Related items