Font Size: a A A

Research On Semi-Supervised Classification Based On Local Learning

Posted on:2013-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LvFull Text:PDF
GTID:1118330374970714Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Traditional machine learning techniques need to use a large number of labeled data points for training, however, in many real world applications, it is very difficult for us to obtain a large amount of labeled data points. It is costly to obtain labeled data points since human labeling is a labor-intensive and time-consuming process and has to rely on the efforts of a small number of domain experts and knowledge engineers. While a large amount of unlabeled data points may be relatively easily and cheaply available. So semi-supervised learning, which only need label a small number of data points, has aroused considerable interests in the field of pattern recognition and machine learning in recent years, and it attracts a large number of researchers from other fields. Meanwhile, it is well applied in many research fields such as text classification, pattern recognition, bio-informatics and so on.Semi-supervised multi-class classification problem and semi-supervised multi-label classification problem are the substantial generalization of semi-supervised binary classi-fication problem. Owing to being much closer to the real problems, both of them have become a research hotspot in the field of machine learning. Methods to solve these two problems often are to decompose them to a set of semi-supervised binary classification problem. However, those maybe bring about many new problems. For example, on the one hand, in semi-supervised multi-class classification problem, when there are very few data points in each decomposed class, the unbalanced data problem would occur. On the other hand, in semi-supervised multi-label classification problem, correlation between one class and other class is not considered, and when the number of original class is quite large, the number of the decomposed sub-problem, which increases exponentially, would become too large to be tractable. In this thesis, on the basis of the graph-based semi-supervised learning algorithm, starting from point of view of optimization, we make use of the "Overall " method and apply the good feature of local learning in semi-supervised multi-class classification problem and semi-supervised multi-label classification problem. More concretely, the main work of this paper includes the following four aspects:1. In Chapter1, firstly, we give brief introduction to the development and research signifi-cance of machine learning, as well as its theoretical basis-basic knowledge of statistical learning theory. Secondly, we give a summary of the development and research situa-tion of semi-supervised learning in machine learning field and local learning. Finally, we outline the motivation and the contents studied in this thesis.2. Chapter2studies semi-supervised multi-class classification algorithm based on local learning. Usually it might be difficult to find a decision function which holds a good classification result in the entire input data space, but it is much easier to find a good decision function which is restricted to a local region of the input data space. Local learning strategy is presented to construct model based on the local neigh-borhood information of each data point, that is to say, the class real value of each data point should be similar to or the same to the output value of the local learn-ing model established in its local neighborhood data points. Local Learning applied in the semi-supervised binary classification problem has shown good characteristics. Firstly, we analyze and derive local learning regularizer in the semi-supervised bi-nary classification problems. Secondly, a novel unit circle class label representation is proposed. Finally, local learning is extended to semi-supervised multi-class classifica-tion problem from the semi-supervised binary classification problem, and numerical experiments certify the effectiveness and efficiency of semi-supervised multi-class clas-sification algorithm based on local learning.3. Chapter3studies semi-supervised multi-class classification algorithm combining global learning and local learning. First of all, according to the characteristics of class label in multi-class classification problem essentially belonging to nominal variable, a flexible, learning and adjustable class label representation is presented. Secondly, the regu-larization method of semi-supervised multi-class classification problem is introduced, which consists of global regularization and local regularization, and local learning regularizer in semi-supervised multi-class classification problem is entirely given, and two algorithms are proposed:(1) Semi-supervised multi-class classification algorithm based on local learning and adjustable class label representation.(2) Semi-supervised multi-class classification algorithm combining global learning and local learning. We argue that rather than applying a local learning regularizer to learn the class label of each data point, it would be more desirable to apply a global learning regular-izer which is based on the construction of a global predictor using the whole data set. Finally, numerical experiments on both standard binary datasets and multi-class datasets demonstrate that the two algorithms are feasibility and effectiveness.4. In Chapter4, we study semi-supervised multi-label classification algorithms based on local learning. Firstly, it is found that in semi-supervised multi-label classification problems representation of output yi corresponding to each input xi is essentially con-sistent with the binary series class label representation in semi-supervised multi-class classification problem. Therefore, local learning regularizer can be introduced into semi-supervised multi-label classification problems. Secondly, it is also found that the "overall " method is just adaptable to solve correlation of one class and other class in semi-supervised multi-label classification problem. Thus, respectively from the viewpoint of data points and class, we construct two undirected weighted graphs. Accordingly, both local learning regularizer based on data points and global learning regularizer based on class labels are obtained, semi-supervised multi-label classifica-tion algorithm based on local learning is proposed. Finally, the real value matrix solution of class labels is got by solving Sylvester equation, and the experimental results illustrate semi-supervised multi-label classification algorithm based on local learning is feasible.5. In Chapter5, we utilize semi-supervised multi-class classification algorithms based on local learning ideas to solve power transformer fault diagnosis problem. A hi-erarchical model of power transformer fault diagnosis is established so as to make fault qualitative judgement and positioning. We make a meaningful attempt to apply semi-supervised classification algorithm in the new fields of application.6. In Chapter6, we make a summary of the work done by the paper and give the next step recommendation.
Keywords/Search Tags:semi-supervised learning, multi-class classification, multi-label classifica-tion, local learning, global learning, regularizer, class label representation
PDF Full Text Request
Related items