Self-training Algorithm Based On Fast Search Of Natural Neighbors

Posted on:2022-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:M S Yuan

Full Text:PDF

GTID:2518306536463754

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In real life,there are a large number of cheap and unlabeled samples that is easy to get.The acquisition of labeled samples with a guiding role is very expensive and time-consuming.In order to make full use of the information of labeled and unlabeled samples,semi-supervised learning has emerged and received widespread attention.Self-Training algorithm is a type of semi-supervised learning.It uses a small number of labeled samples to train a classifier,and tries to obtain high-performance classifiers by finding high confidence points of labeled samples.How to determine the high-confidence points of a small number of labeled samples is the key point in the self-training algorithm,which determines whether a high-performance classifier can be obtained.To address this problem,this thesis introduces the idea of natural neighbors.The natural neighbor algorithm can find the neighbors of data points without human intervention.In Natural Neighbor,it gradually expands the detection boundary of the neighborhood for each point on a given data set and obtains the distribution characteristics of the data set by adaptive learning.We study the self-training algorithm and the natural neighbor algorithm in this thesis.The innovations are as follows:1.In the traditional Natural Neighbor algorithm,it is quite time-consuming to find the neighborhood of the data set which needs to gradually expand the searching range of the neighborhood.To solve this problem,this thesis proposes a fast search of natural neighbors algorithm(FSNN).It determines the upper limit of the neighborhood reversely by finding the most isolated point.It can reduce the times of iteratively searching the neighbors so it improves the efficiency based on natural neighbor algorithm.By comparing with the natural neighbor algorithm,the experimental results show that the FSNN algorithm can search the neighbors effectively and the speed of searching neighbors is significantly improved.2.Aiming at the problem of finding high-confidence points in the self-training algorithm,by combining the fast search of natural neighbors algorithm with the self-training algorithm,we propose a self-training algorithm based on fast search of natural neighbors(STAFSNN)in this thesis.It selects labeled points randomly.Then the natural neighbors of these labeled samples are added to the training set as high-confidence points.Next the classifier is used to label these points and the training set is updated which is used to train a new classifier.When all the unlabeled samples are marked,the algorithm ends.On the UCI data set,we compare the proposed algorithm with other algorithms.The experimental results show that the algorithm is effective and can accurately find high confidence points.It has certain advantages compared with other algorithms.

Keywords/Search Tags:

Machine Learning, Classification, Semi-Supervised Learning, Self-Training, Natural Neighbor

PDF Full Text Request

Related items

1	Application Reseasrch Of Natural Neighbor Graph Based Semi-supervised Learning For Image Retrievial Technology
2	The Research On Semi-supervised Classification Algorithm Based On Two Different Composition Method
3	Research On Network Anomaly Detection Method Based On Semi-supervised Learning Strategy
4	The Web Pages Classification Method Based On Semi-supervised Support Vector Machine
5	Research And Implementation Of Semi-supervised Machine Learning Algorithms For Classifying The Imbalanced Protocol Flows
6	Research On Semi-supervised Self-training Method
7	Research On The Application Of Semi-supervised Learning In Natural Language Processing
8	Research On Semi-supervised Learning Classification Algorithm
9	Research Of Reliable Semi-supervised Classification
10	Based On The Positive And Unlabeled Samples, Semi-supervised Classification