Font Size: a A A

Multi-label Learning Based On Label Specific Features And Correlation

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:W YuFull Text:PDF
GTID:2428330614458379Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,multi-label learning algorithms have been successfully applied in many fields such as image categorization,text categorization,music retrieval applications,bioinformatics,and multimedia content annotation.Each sample in a multi-label data set is labeled by several labels.The purpose of multi-label classification is to learn related label sets for unseen instances.As the form of multi-label data becomes more complex and the number of labels becomes larger,multi-label classification models are generally more complex and face more challenges.The challenges faced by the multi-label classification algorithm include the following three aspects: 1)The labels of each category of multi-label dataset have specific features,which enrich the hidden information by the labels.Mining the samples association relationship when constructing specific features can help improve the corresponding label expression ability.2)Exploring and mining label correlation greatly contributes to improving the accuracy of classification;3)Due to the large number of labels,the problem of class imbalance in multi-label classification becomes more and more difficult.This thesis focuses on these three challenges,and proposes two multi-label classification algorithms based on label specific features.The main research contents are as follows:1.Based on the problem of how to effectively construct label specific features,the method of boosting clustering trees for multi-label learning is proposed.First,the clustering feature trees are used to store the data in the original feature set.To obtain the similarity between the samples of similar populations,the intrinsic associations are stored in the tree structures and add the associations of the samples to the original feature set.Then,random subset is used to learn several classification boosting trees for each class label,and the specific features of each label are learned by calculating the residual value of the boosting trees.The comparative experiments on 11 datasets in each application area and seven multi-label classification algorithms show that the proposed method performs well on each evaluation index,demonstrating that it effectively improves the performance of multi-label classification.2.Based on the Label Specific Features(LIFT)algorithm,this thesis proposes an improved LIFT algorithm based on correlation and imbalance problems.The algorithm uses a de-noising auto-encoder to learn the robust features in the original feature space before constructing the specific features.Then,each label sparsely shares the correlated specific features.The method combines the label correlation to deal with the problem of class imbalance and applies sampling method to expands the samples of a small number of samples.The results of the study are that the proposed scheme has better generalization performance on seven unbalanced multi-label data sets.At the same time,compared with the mainstream specific feature algorithms on the unbalanced index,the algorithm has a significant improvement,indicating that the algorithm is effective for mitigating category inequalities.
Keywords/Search Tags:Multi-label classification, label specific features, label correlation, class imbalance
PDF Full Text Request
Related items