Font Size: a A A

Research On Several Key Issues Of Multi-label Learning For Limited Supervised Information

Posted on:2022-12-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y GuanFull Text:PDF
GTID:1488306758479124Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multi-Label Learning(MLL)studies the learning problem when a sample is associated with multiple labels,and is widely used in many practical application scenarios such as image labeling,information retrieval,recommender systems,and bioinformatics.Although some excellent MLL algorithms have been proposed in recent years,they often rely on complete and accurate supervision information,and it is usually difficult and expensive to collect accurately annotated data in practical applications.In the MLL task,due to the increasing dimensions of the feature space and the label space,the performance of the classification model is affected and the difficulty of accurately labeling the data is further exacerbated.Therefore,how to solve the problem of multi-label learning with limited supervised information has gradually become a key bottleneck of MLL.To alleviate this problem,more and more scholars have begun to study MLL tasks for limited supervised information.According to the different settings of the limited supervision information provided,there are roughly three types of MLL tasks.Multi-Label learning with Missing Labels(MLML),Partial Multi-Label learning(PML)and Semi-supervised Multi-Label learning(SML).These tasks are respectively used to solve some special cases of limited supervision,but the supervision information in practical scenarios is often more complex,and it is very likely that there are multiple kinds of the above-mentioned limited supervision information at the same time.Faced with increasingly complex data in practical application scenarios,although the current MLL methods have made some progress in some tasks,there are still some problems that need further research.First of all,in the face of increasingly complex data features,how to more effectively extract the relationship between features and labels is the key to avoiding the"dimension disaster"and improving the performance of MLL task classification.Secondly,the data labeling situation in practical application scenarios is more complicated,and limited supervision information of different settings usually appears along with it.Therefore,the MLL problem for mixed limited supervision information is closer to the real situation,and how to solve the MLL problem with mixed limited supervision information has important research value.This paper mainly conducts in-depth discussions and researches on the above two issues,and proposes some new MLL methods.The main research contents include the following aspects:1.In view of the high-dimensional feature challenge faced by multi-label data,inspired by the related MLL methods based on label-specific features,a method LETTER is proposed to construct label-specific features from both the sample level and the feature level.Existing methods for constructing label-specific features only consider the distribution information from the sample level,ignoring the distribution information at the original feature level,thus affecting the ability of the reconstructed features to discriminate labels.Similar to the sample distribution,the original feature distribution of the positive and negative sample sets corresponding to each label is also very different.Based on the above assumptions,this paper proposes a feature reconstruction method that considers both the sample distribution and the original feature distribution to construct label-specific features.In order to verify the effectiveness of LETTER,experiments are conducted on 14 widely used multi-label datasets from various fields,the experimental results verify the robustness of the proposed algorithm.2.For the Incorrect Multi-label Learning(IML)task of inaccurate labelling that exists at the same time as missing labels and partial labels,this paper proposes a new two-stage IML method C~2LP-IML based on label propagation.Existing IML methods usually require a part of accurately labelled samples or other additional supervision information.Identifying noise in labeling information while filling in missing labels is the core problem of this task.In addition,in the past two years,some multi-label learning algorithms that solve the MLML problem and the PML problem independently have been proposed one after another,but for the IML task with more problems with supervision information,these methods still have a certain room for improvement.The basic idea of C~2LP-IML is that the neighborhood space can provide great help for correcting labels,and the true label of the sample should have a higher label frequency on its neighbor samples,while mislabeled noisy labels do the exact opposite.Therefore,C~2LP-IML uses iterative label propagation to extract trusted labels from the candidate label set and non-candidate label set for subsequent model learning,respectively.Then use maximum a posteriori inference to rank the labels pairwise to generate a multi-label prediction model.To verify the effectiveness of C~2LP-IML,extensive experiments are performed on 15 artificially synthesized data based on 5 widely used benchmark multi-label datasets,and the experimental results verify the robustness of the proposed algorithm.3.Aiming at the Semi-supervised Partial Label Multi-label Learning(SPML)task where both partial labeling and semi-supervised problems coexist,a SPML method LION based on Low-rank assumptions and manifold constraints is proposed.In the SPML scenario,the real label information is completely unknown.How to accurately propagate the supervision information while filtering redundant labels is the core problem of this task.In recent years,some multi-label learning algorithms that separately solve the partial labeling problem and the semi-supervised problem have been proposed,but when solving SPML task,most of these methods cannot achieve good classification performance due to their own task settings.On the one hand,LION filters noise in candidate labels by capturing local label correlations based on the low-rank assumption.On the other hand,LION exploits manifold regularization to capture the neighborhood structure of samples,thereby diffusing supervision information to unlabeled samples.In order to verify the effectiveness of LION,extensive experiments are conducted on 48 artificially synthesized data sets based on 4 widely used benchmark multi-label datasets.Extensive experimental results show that LION achieves the best classification performance in most cases,and has strong robustness in the face of redundant supervision information and few labeled samples.
Keywords/Search Tags:Multi-label learning, Label-specific features, Limited supervised information, Inaccurate multi-label learning, Semi-supervised partial multi-label learning
PDF Full Text Request
Related items