Font Size: a A A

Studies On The Algorithms Of The Semi-supervised Classification And Techniques

Posted on:2012-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:G Q LuFull Text:PDF
GTID:2218330338973225Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology, data collection and data storage technology make a rapid progress in accumulating huge amounts of data for various organs. However, extracting the useful information and knowledge has become a huge challenge. Data-rich data and analysis tools coupled with strong demand can be described as data rich but information poor. It is difficult to find data patterns in fast-growing mass of data collection, storage and large data storage in large database. Data mining combining traditional data analysis and complex machine learning algorithms can help people analyze huge amounts of data which have the model and the potential of knowledge, and give people a great convenience.Semi-supervised learning is a new and important research branch of the machine learning and data mining. Facing with the data analysis and data mining deepening of the practical problems, many researchers pay much attention to semi-supervised learning technology. Semi-supervised learning is like a capable learner. It can handle and recognize most of the unlabeled data and some data's label has been known, how to build a learning model. It mainly use the model assumptions of the data distribution to build a learning model to deal with the data is not unlabeled. Therefore, how to make full use of labeled data and unlabeled data to build a good model to improve the accuracy and the performance of learning is a challenging problem.This paper studies the semi-supervised classification learning techniques and algorithms which are from the traditional classification, and they are able to adapt to a small number of categories of data label are known and a lot of the data which have not been labeled. We describe the status of semi-supervised learning techniques in detail, and introduced the model of semi-supervised classification technique and model. What is more, we presents new strategies to improve the algorithms and check the effectiveness of our strategy via doing experiments.Base on KNN semi-supervised self-training model treating the labeled data and the unlabeled data in the same way rather than distinguish original labeled data and the afterward trained data. This paper presents an improvement strategy which can better deal with the boundary data. The experiments also show that the improved method is better than the original method on classification accuracy.On the other hand, with the support vector machine technology, we analysis the model of the semi-supervised support vector machine technology (S3VMs), and gives an improved method. Second, we use the particle swarm algorithm to optimal the semi-supervised support vector machine model's parameters and the experiment verify the improvement algorithm have a better ability and performance. At last, studies the semi-supervised coordination training (Co-training) algorithm and the traditional classification methods, and gives a semi-supervised cooperative training (Co-training) of the improved algorithm, the results show that we get better achievement if we use several classification model.Based on the discussion above, the main innovations of this paper can be summarized as follows:First, in order to make full use of the characteristics of labeled data and unlabeled data, we present an improved model of the semi-supervised training (self-training) algorithm.Second, Analysis the branch and bound strategy for semi-supervised support vector machines classification algorithm and integration of particle swarm algorithm to optimize the parameters of the learning method.At last, give the semi-supervised cooperative training (Co-training) model, and find a good method to improve the Co-training learning.In order to verify the effectiveness and validity of the proposed algorithms and strategies, we do many experiments on real datasets. The experiments results show that under the assumption of the model is right, the improved of the algorithm are better than the original algorithm.
Keywords/Search Tags:semi-supervised learning, classification, Co-training
PDF Full Text Request
Related items