Font Size: a A A

Local Learning And Global Preserving Based Semi-supervised Algorithm For Large Scale Classification Problems

Posted on:2014-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2268330401458720Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
I n many classification problems, people are often faced with the case that there are veryfew labeled samples and many unlabeled samples. It makes semi-supervised classificationalgorithm become an important research content in the areas of pattern recognition andmachine learning. However, along with the development of internet in the information age,the scale of data becomes larger and larger in many problems. It leads that traditionalsemi-supervised classification algorithms are infeasible due to the high time and spacecomplexity. Therefore the research of large scale semi-supervised classification algorithms isvery important and urgent in related fields.This thesis approved an effective large scale semi-supervised classification algorithmLLGP based on the idea of local learning and global preserving. Based on the idea of locallearning, firstly a clustering feature tree is constructed with all unlabeled samples. The treeallows us to divide the large scale samples into a series of small scale subsets, called localsubsets. Then traditional graph-based semi-supervised methods can be used to classify localsamples with labeled ones. Thereby large scale problems can be solved via reducing theirscale based on local learning.On the other hand, simply learning with labeled samples and local samples has apotential problem: local samples can’t preserve the overall structure of original samples, i.e.the global structural is destroyed. Under the basic assumptions of semi-supervised learning,partial local samples which are far away from the correct labeled samples are misclassifiedbecause they are closer to wrong ones, therefore the final accuracy will be much bad. So thisthesis approved the idea of global preserving which means adding k-means clustering centersas frame points to each local problems to keep the global structure, it can weaken the effect ofglobal structure destroyed. In addition, in order to use the k-means algorithm for large scalesamples, the k-means method is applied to the samples which are compressed by theclustering feature tree instead of the original large scale samples. It can also accelerate thespeed of obtaining frame points. The corresponding result of parameter analysis showed theimportance and significance of frame points.The experimental results on the medium-sized datasets showed that the proposed LLGPcan obtain considerable accuracy and speed up the learning process compared to traditionalalgorithms. The experimental results on large datasets showed that LLGP can obtain betteraccuracy than other large scale semi-supervised algorithms with comparable time. If the sizeof samples grows too much, other algorithms become infeasible while LLGP still performs well.
Keywords/Search Tags:Large scale, Semi-supervised, Local learning, Clustering feature tree, Globalpreserving
PDF Full Text Request
Related items