Font Size: a A A

Research And Implementation Of Semi-Supervised Based Self-Training Classification Model

Posted on:2010-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:T DingFull Text:PDF
GTID:2178360302960773Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Semi-Supervised Learning is a new studying method proposed in recent years. It can be divided into two categories semi-supervised classification and semi-supervised clustering respectively according to its studying purpose. Its main idea is that how can we combine the labeled training set with small number and the unlabeled ones with large number to improve the performance of the classification.We discuss semi-supervised classification mainly in this paper and we make a lot of research and analysis on self-training algorithm which is a classic algorithm in semi -supervised classification. We attempt to give an improved model based on the truth that when in initial the training set is so small and the classifier we get can not be so accuracy as we have expected. We introduce a data editing technique that based on nearest neighbor rules to identity the wrong labeled ones in the training and classifying process in order to purify the training set. We exploit this technique in the iteration process of the training to identify and remove the noise data, purify the training set, improve the accuracy of the classification. The experiment data sets in this paper are selected randomly from the UCI machine learning repository and the result shows that the classification accuracy of the improved one are improved differently. According to the analysis of the result we can conclude that the average classification performance is improved by 6.705%.According to the fact that the Tri-Training model's generalization is weak, so we also give an improved model. In this paper an improved model that the different classifiers are in cooperation with each other and the vote rule is used as the rule to classify the unlabeled data. The improved one is based on the model that proposed by Zhou. And we also introduce the data editing technique as we have done in self-training algorithm to purify the training set. The experiment data set are also from the UCI machine learning repository. According to our experiment data, we can conclude that the new model we proposed have a good performance in classification and the accuracy of the classification is improved.
Keywords/Search Tags:Semi-Supervised Classification, Data Editing, Self-Training, Unlabeled Data
PDF Full Text Request
Related items