Font Size: a A A

A Multi-label Classification Algorithm Based On Label Correlation And Class Imbalance

Posted on:2020-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhongFull Text:PDF
GTID:2428330590960696Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of data mining,multi-label classification has been widely used in text classification,image classification,bioinformatics,information retrieval and video classification.Multi-label classification learns one or more labels for each sample.As data form becomes more and more complex,the number of labels increases,and the number of possible label sets of unknown samples also increases exponentially,that is,multi-label classification faces the problem of huge output space.In reality,labels are likely to be related in the semantic space.If the correlation between labels can be fully considered in the learning process,the problem of huge output space can be avoided to a certain extent,and then the classification performance of the classifier can be improved.At the same time,in the classification data set,the number of samples in different categories tends to vary greatly,that is,the multi-label classification is likely to face class imbalance.If class imbalance is neglected in the learning process,it may cause the final prediction results to be biased toward the label with more samples,thereby affecting the classification performance.In this paper,the related research work is carried out for multi-label classification,and a multi-label classification algorithm MLCI(Multi-Label Classification Algorithm Based on Label Correlation and Class Imbalance)based on label correlation and class imbalance is proposed.The main research work is as follows:(1)Aiming at the problem of huge output space of multi-label classification,the MLCI algorithm effectively avoids dealing with a large number of potential label sets and improves classification performance by considering the correlation of labels.Specifically,in order to obtain the correlation between labels,the MLCI algorithm constructs a multi-class classification problem by coupling the other two labels for each label.(2)In order to avoid over-emphasizing the inter-linkage and influence between the labels and ignoring the characteristics of the single label,the MLCI algorithm constructs a binary classification problem for each label to reflect the characteristics of the corresponding label,thereby improving the classification performance of the algorithm.(3)In order to solve the class imbalance problem,the MLCI algorithm undersamples the data set for the constructed binary classification problem,so as to construct a new data set with sample distribution equilibrium,and then trains a binary classifier according to the new data set.At the same time,for the constructed multi-class classification problem,some different labels of data is consolidated to reduce the class imbalance rate of the data set,thereby improving the performance of the multi-class classifier.(4)In this paper,the effectiveness of the proposed algorithm is verified by extensive experiments on seven multi-label data sets in different fields.The experimental results and their analysis show that compared with other seven classical classification algorithms,the MLCI algorithm has better performance on the six commonly used multi-label classification performance evaluation indicators.
Keywords/Search Tags:Multi-Label Classification, Binary Classification Problem, Multi-Class Classification Problem, Label Correlation, Class Imbalance
PDF Full Text Request
Related items