Font Size: a A A

Comparison And Improvement Of Two Methods Based On Semi-Supervised Learning

Posted on:2011-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:J L LuFull Text:PDF
GTID:2178330332964803Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, there are many industrial problems which have been focused on the data analysis and data mining areas, Semi-supervised learning (SSL) framework has a great development in theory and application of research. Semi-supervised learning focuses on situations which have some lost labeling information of training dataset, and develops a learning model which has good performance and population capability based on situations. Co-training and Multi-view methods are basic techniques in SSL theory framework, and there are so many applications based on the two methods in the industrial research works and they also get good feedback comparing with other methods.Firstly, the thesis introduces SSL theory framework including developing process and main methods, and also describes the important role of SSL in data mining research system. Meanwhile, three basic theories, which include Naive Bayes, expectation maximization(EM) and finite mixed models(FMMS), have been introduced, and they are the basic theories in the thesis research.Secondly, the thesis explains the application background and key points of Co-training method, and expounds that cluster hypothesis and partition set's PAC setting are main restrictions when using Co-training. The thesis also presents two important applications--natural language processing and content based image retrieval(CBIR) of Co-training.Thirdly, the thesis states detailed formulas of Multi-view EM algorithm. They focus on Multi-view EM algorithm based on Naive Bayes classifier and FMMS. The thesis also presents a new improvement algorithm based on the Multi-view EM algorithm which adds weight variables for coordinating contribution of different views in the process. As the experimental result shows, the improvement can increase the performance ratio in the classification.Finally, the thesis generates an experimental setting, and the dataset comes from tobacco product research and development department. The experiment result shows that Co-training method has a good classifier ratio comparing with traditional methods. From the comparison between Co-training and Multi-view EM, we know that the latter method can get a better classification performance. This experiment can help tobacco corporations to enhance the effectiveness in the new product design.
Keywords/Search Tags:Semi-Supervised Learning, Multi-view EM, Co-training, Naive Bayes Classifier, Tobacoo Data Classification
PDF Full Text Request
Related items