Font Size: a A A

A Semi-supervised Active Learning Algorithm Research

Posted on:2013-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2248330374999795Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semi-supervised learning is a hotspot in machine learning and data mining now. Witha small amount of priori knowledge (such as the labelled data or pairwise constraints) andthe distribution of a large number of unlabelled data, semi-supervised learning can makethe data points be classified correctly. Many studies indicate that priori knowledge canhelp us to improve clustering performance, but at the same time, because of the improperchoice of supervisory information, it may cause a decline in clustering performance, sothe choice of supervisory information attracts a general attention.In this dissertation, semi-supervised learning is combined with active learning toimprove the clustering performance by improving the quality of supervisory information.On one hand, the label of data point with highest information may accelerate theclustering process, on the other hand, confirming the pairwise constraints with highuncertainty can improve the results of clustering quickly. The main work of thisdissertation consists of three parts:Firstly, a new semi-supervised nearest-neighbour learning algorithm is proposed formixed constraints information, the labelled data and the pairwise constraints lead theprocess of learning in different ways, and good results are achieved. Specifically, thelabelled data are used to calculate the distance between an unlabelled data with thelabelled data set, and the pairwise constraints are used to control the assignment of a labelto an unlabelled data.Secondly, some active learning strategies based on neighbourhood inconsistency areproposed, including learning of data points and learning of pairwise constraints. In thelearning of data points, two strategies are proposed, named score strategy based onCitation-KNN and strategy based on bridge points, and the comparison with other twoalgorithms are conducted. In the learning of pairwise constraints, ALEC(Active Learningof pair-wise constraints based on Error Correction) is proposed. All the proposed learningstrategies are proved to be effective by the experiments on real data sets.Finally, a preprocessing method is given for large data sets. The exemplar set isobtained by taking the skeleton extraction approach to the whole data set, and theclustering is conducted on the exemplar set, and then the whole data set is labelled. Apreliminary experiments show that, by compressing the original data set, we can maintaina stable CRI while significantly reducing the time required for clustering.
Keywords/Search Tags:semi-supervised learning, active learning, learning strategy
PDF Full Text Request
Related items