Font Size: a A A

Self-training Algorithm Based On Fast Search Of Natural Neighbors

Posted on:2022-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:M S YuanFull Text:PDF
GTID:2518306536463754Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In real life,there are a large number of cheap and unlabeled samples that is easy to get.The acquisition of labeled samples with a guiding role is very expensive and time-consuming.In order to make full use of the information of labeled and unlabeled samples,semi-supervised learning has emerged and received widespread attention.Self-Training algorithm is a type of semi-supervised learning.It uses a small number of labeled samples to train a classifier,and tries to obtain high-performance classifiers by finding high confidence points of labeled samples.How to determine the high-confidence points of a small number of labeled samples is the key point in the self-training algorithm,which determines whether a high-performance classifier can be obtained.To address this problem,this thesis introduces the idea of natural neighbors.The natural neighbor algorithm can find the neighbors of data points without human intervention.In Natural Neighbor,it gradually expands the detection boundary of the neighborhood for each point on a given data set and obtains the distribution characteristics of the data set by adaptive learning.We study the self-training algorithm and the natural neighbor algorithm in this thesis.The innovations are as follows:1.In the traditional Natural Neighbor algorithm,it is quite time-consuming to find the neighborhood of the data set which needs to gradually expand the searching range of the neighborhood.To solve this problem,this thesis proposes a fast search of natural neighbors algorithm(FSNN).It determines the upper limit of the neighborhood reversely by finding the most isolated point.It can reduce the times of iteratively searching the neighbors so it improves the efficiency based on natural neighbor algorithm.By comparing with the natural neighbor algorithm,the experimental results show that the FSNN algorithm can search the neighbors effectively and the speed of searching neighbors is significantly improved.2.Aiming at the problem of finding high-confidence points in the self-training algorithm,by combining the fast search of natural neighbors algorithm with the self-training algorithm,we propose a self-training algorithm based on fast search of natural neighbors(STAFSNN)in this thesis.It selects labeled points randomly.Then the natural neighbors of these labeled samples are added to the training set as high-confidence points.Next the classifier is used to label these points and the training set is updated which is used to train a new classifier.When all the unlabeled samples are marked,the algorithm ends.On the UCI data set,we compare the proposed algorithm with other algorithms.The experimental results show that the algorithm is effective and can accurately find high confidence points.It has certain advantages compared with other algorithms.
Keywords/Search Tags:Machine Learning, Classification, Semi-Supervised Learning, Self-Training, Natural Neighbor
PDF Full Text Request
Related items