Font Size: a A A

Research On Graph-based Semi-Supervised Learning Model And Classifier Design

Posted on:2010-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:J B HaoFull Text:PDF
GTID:2178360302459867Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In traditional machine learning area, there are two commonly used learning algorithms, supervised learning and unsupervised learning, but neither of them is suitable for dealing with the situation where there are few labeled data and large amount of unlabeled data. To this problem, semi-supervised learning was proposed recently and has attracted great research interest. Semi-supervised learning combines the advantage of both traditional learning methods, it can build a better classifier by using the unlabeled data together with the labeled data. This paper does some research on semi-supervised classification algorithms, the main work is as follows:Firstly, this paper analyses the typical semi-supervised classification algorithms, by comparing it to supervised ones, finds that the classifier's accuracy is closely related to its model assumption. Only when the model assumption matches the problem structure well, the unlabeled data can help to improve the classifier's accuracy; otherwise, unlabeled data may be of no use, or even have bad affect on the classifier.Secondly, by doing experiment of label propagation algorithm (LP), we find that LP is sensitive to the quality of training set which are randomly chosen by traditional ways. This means that LP can be improved by actively selected good training set. In active learning, classifier can query unlabeled data that improve its performance most. By combining active learning thought, an active learning based LP algorithm (AL-LP) is proposed. This algorithm can actively select unlabeled data that can degrade the classification risk most so as to make the accuracy increase faster. Promising experimental results of UCI et al data sets show that, when labeled data number is the same, AL-LP can achieve higher accuracy than LP by randomly selected training set. Through the analysis of the frequently queried data, we find AL-LP is prone to select the cluster center nearby data. This means that it is very meaningful to select the cluster center data as the training set for LP.Graph-based semi-supervised learning first construct a graph where labeled and unlabeled data are represented as vertices, and edges encode the similarity between data. But this kind of graph-constructing method often faces the difficulty of choosing the similarity function, as well as its parameters, and the number of nearest neighbors. Aiming at this, we investigate locally linear embedding algorithm (LLE), and find LLE doesn't use similarity function when constructing linear neighbors, and by detecting the local manifold and judging whether the data is near the classification margin, we could easily rectify the number of data's nearest neighbors so as to decrease the connections between data of different classes and reduce the mis-label-propagating probability. Based on these two points, this paper proposes LLE based graph-constructing method, and applies it to LP. Experimental results of UCI et al data sets show that this method is easy to use, and that LP based on this kind of graph performs better than LP based on traditional graph.
Keywords/Search Tags:semi-supervised learning, active learning, graph-based method, label propagation algorithm, locally linear embedding
PDF Full Text Request
Related items