Font Size: a A A

Multi-Label Learning Based On Exploiting Label Dependency

Posted on:2017-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:B FuFull Text:PDF
GTID:1108330485960335Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-label classification is an important research topic in the field of machine learn-ing and data mining, which aims to predict multiple labels, rather than just one, for a given instance at the same time. In many real applications, there exist potential dependencies among multiple labels, and exploiting the underlying knowledge and information could effectively improve the performance of learning models. Therefore, how to learn and utilize the dependencies among labels has become one of the key issues of multi-label learning.This paper firstly summarizes and analyzes existing works that related to multi-label learning, and analyzes the advantages and disadvantages of existing methods. Then, to explore various ways of exploiting various types of label dependencies in different prac-tical scenario, the paper has proposed more effective multi-label learning models and algorithms. The contributions of this paper mainly are:(1) For each label, the classifier chain model essentially determines the labels it depends on via a random way. Thus the label dependencies it learns may be not true. To deal with this issue, this paper proposes an innovative method which uses a tree-structured Bayesian network to represent the dependencies between multiple labels. This method explicitly measures the dependency degree, and then construct a Bayesian network which takes the labels as nodes, and dependency degree as the weights of edges. Therefore, it can learn more reasonable dependencies among multiple labels. Furthermore, we use the ensemble learning technique to construct multiple tree-structured Bayesian networks, in order to fully consider the mutual dependencies between labels. The experimental results verify the effectiveness of this algorithm, which indicate that the label dependencies, which learned by explicitly measuring the dependence degree between labels, are helpful to improve the classification performance.(2) This paper proposes a new method that utilizes the graph to represent the de-pendency degree between labels, and realizes the propagation of label dependencies as a random walk process on the graph. This method first constructs a graph structure of the labels, and then uses the RWR (random walk with restart) model to simulate the iterative propagation process of label dependencies. For a given test instance, an initial predicted value is given for each label firstly as its probability of being the instance’s true label, then an iterative method is used to repeatedly update the value of each label until they reach convergence. This iterative updating can not only consider direct dependence between labels, but also utilize the indirect dependencies between labels. The experimental results show that this proposed method is significantly better than other algorithms on most of the data sets in terms of a number of measure metrics, especially when the data set has more labels. This shows that considering the iterative propagation dependence between labels, can explore and utilize the potential information effectively.(3) Based on above method, this paper further proposes a more advanced multi-label learning model that can take multiple factors into consideration when learning dependen-cies degree between labels, and try to learn the optimal dependency degrees by optimizing a given object function. Inspired by the principle of multiple kernel learning, this method firstly measures the dependency degree between labels from different perspectives, and gets multiple results. Then, a linear regression model is used, which takes the different measure results as input, to learn the optimal dependency degree between two labels. The advantages of this method include:Firstly, it could get a more comprehensive measure of dependence degree by including multiple perspectives. Secondly, it can learn the optimal parameters of the linear model, also dependency degree between two labels, by mini-mizing a given loss function. The experimental results show that learning dependence degrees between labels by optimizing the object function, the method can significantly improve the performance, compared with the other methods including the one proposed above.(4) This paper proposes a multi-label learning algorithm based on the matrix fac-torization technique, to predict a better label ranking. This method maps the original instance space and label space into a low-dimensional space, thus can reduce the number of classifiers needed to be trained and the amount of computation. For each instance in the training set, there whole label can be divided into two set:the labels that have been explicitly given as the true labels in the data set, and others. Most of existing methods assume that if a label is not given, it should not be the true label of the instance (1 or 0). To avoid the wrong information this assumption may introduces, our proposed method simply assumes that for each instance, the explicitly given labels should be more relat-ed to the instance, compared with the labels that are not given explicitly. Accordingly, this method uses a loss function similar to AUC curve. By optimizing this loss function, our proposed method can predict a better label raking in which the explicitly given labels would be ranked before the labels that are not given explicitly. Therefore, this method can fully utilize the label dependencies to predict a more reasonable label ranking. The ex-perimental results show that this method can achieve better performance compared with other methods.In terms of exploiting label dependencies from different perspectives, this paper has proposed several more effective methods and verified their effectiveness by experiments. These results lay a good foundation for further study and applications.
Keywords/Search Tags:Data mining, Classification, Multi-label learning, Label dependency, Label ranking
PDF Full Text Request
Related items