Research And Implementation Of Semi-Supervised Based Self-Training Classification Model

Posted on:2010-12-29

Degree:Master

Type:Thesis

Country:China

Candidate:T Ding

Full Text:PDF

GTID:2178360302960773

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Semi-Supervised Learning is a new studying method proposed in recent years. It can be divided into two categories semi-supervised classification and semi-supervised clustering respectively according to its studying purpose. Its main idea is that how can we combine the labeled training set with small number and the unlabeled ones with large number to improve the performance of the classification.We discuss semi-supervised classification mainly in this paper and we make a lot of research and analysis on self-training algorithm which is a classic algorithm in semi -supervised classification. We attempt to give an improved model based on the truth that when in initial the training set is so small and the classifier we get can not be so accuracy as we have expected. We introduce a data editing technique that based on nearest neighbor rules to identity the wrong labeled ones in the training and classifying process in order to purify the training set. We exploit this technique in the iteration process of the training to identify and remove the noise data, purify the training set, improve the accuracy of the classification. The experiment data sets in this paper are selected randomly from the UCI machine learning repository and the result shows that the classification accuracy of the improved one are improved differently. According to the analysis of the result we can conclude that the average classification performance is improved by 6.705%.According to the fact that the Tri-Training model's generalization is weak, so we also give an improved model. In this paper an improved model that the different classifiers are in cooperation with each other and the vote rule is used as the rule to classify the unlabeled data. The improved one is based on the model that proposed by Zhou. And we also introduce the data editing technique as we have done in self-training algorithm to purify the training set. The experiment data set are also from the UCI machine learning repository. According to our experiment data, we can conclude that the new model we proposed have a good performance in classification and the accuracy of the classification is improved.

Keywords/Search Tags:

Semi-Supervised Classification, Data Editing, Self-Training, Unlabeled Data

PDF Full Text Request

Related items

1	Research On A Semi-supervised Random Forest Classification Algorithm And Its Parallelization
2	Study On Semi-supervised Recommendation Method Based On Co-training
3	Exploitation of unlabeled data and related tasks in semi-supervised learning
4	Research And Implementation Of Classification Model On Big Data In Healthcare Based On Semi-supervised Learning Algorithm
5	Semi-supervised learning with multiple views
6	A Study On Learning From Positive And Unlabeled Examples
7	Based On The Positive And Unlabeled Samples, Semi-supervised Classification
8	Semi-supervised Image Classification Based On Improved Ladder Network
9	Learning with unlabeled data
10	Research On Semi-supervised Learning Classification Algorithm