Font Size: a A A

Research On Multi-label Classification Algorithm Based On Label Correlation Analysi

Posted on:2024-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:W B ZhaoFull Text:PDF
GTID:2568307109487594Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Multi-label classification(MLC)is a research area that focuses on the association between a feature sample and multiple semantic labels,and the problem of labeling the sample with relevant labels.MLC technology has received increasing attention in many popular practical applications,such as text classification,image annotation,information retrieval,recommendation systems,and gene function prediction.However,in dealing with multi-label classification problems,due to the complexity of the problem,MLC faces issues such as how to effectively use label correlation,class imbalance,and feature redundancy.Although many multi-label classification algorithms with excellent classification performance have been proposed in recent years,they have only made initial explorations in utilizing label correlation and have not fully considered label correlation information.Furthermore,these algorithms are limited to studying and solving only one of the problems faced by multi-label classification.To address the shortcomings of existing algorithms,this article proposes two new multi-label classification algorithms.The complete research content is as follows:1)Inspired by the success of the maximum spanning tree,which is a tree built based on the maximum edge weight,and has achieved good results in many engineering applications,this paper proposes a multi-label classifier chains algorithm based on the maximum spanning tree and directed acyclic graph(max STCC).This algorithm builds the maximum spanning tree of labels by computing the correlation between labels,in order to maximize consideration and utilization of label correlation information.Then,the algorithm defines the mutual decision difficulty between labels through conditional entropy and uses it as a measure of the mutual dependency between labels.Based on this measure,the algorithm determines the dependency direction as the one with the smaller decision difficulty and transforms the maximum spanning tree into a directed acyclic graph.Finally,the algorithm uses topological sorting to output the label sequence in the directed acyclic graph,and trains and predicts using the optimized label sequence with a classifier chain algorithm.The proposed max STCC algorithm is experimentally compared with other relevant algorithms on seven publicly available datasets.The experimental results show that the max STCC algorithm achieves excellent classification performance in all aspects,affirming its contribution to exploring and utilizing label correlation information.2)In the real world,labels are not always correlated,and introducing unnecessary label correlation can have a negative impact on the classifier.Based on this,this paper proposes a multi-label classification algorithm with pruning and divide-and-conquer strategy(MLCb PDC).The algorithm prunes labels based on their degree of correlation,dividing them into leaf labels and branch labels.To alleviate the problem of class imbalance,the feature set is also divided into subsets of feature data corresponding to leaf and branch labels.Since the degree of correlation between leaf labels is relatively low,label correlation is not considered during processing.In order to improve classification accuracy and solve the problem of attribute noise caused by redundant and irrelevant features,the RFBR algorithm is proposed for classification of the leaf label set.For branch labels,the max STCC-BR algorithm is used to process them using a stacking structure to utilize the correlation information between them.The predicted labels for leaf and branch labels are combined as the final result.MLCb PDC is compared with other relevant algorithms on seven publicly available datasets,and the experimental results confirm the effectiveness of MLCb PDC and demonstrate that targeted processing of labels with different degrees of correlation can indeed improve classification accuracy.
Keywords/Search Tags:multi-label classification, label correlation, maximum spanning tree, directed acyclic graph, pruning and divide-and-conquer strategy
PDF Full Text Request
Related items