Font Size: a A A

Research On The Problem Of Semi-supervised Model Misspecification Based On Kernel Clustering

Posted on:2018-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2348330533469223Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Semi-supervised learning is an important research issue in machine learning.As is well known,a large number of labeled samples can help improve the performance of the learner,but the collection of a large number of labeled samples is timeconsuming and labor-intensive.In semi-supervised learning,a small number of labeled samples as well as a large number of unlabeled samples are mixed together for training,and thus semi-supervised learning has received more and more research attention.However,when the initial model assumed does not match with the underlying data distribution,adding unlabeled data for training models will reduce the performance of the learner,we call this issue model misspecification.This thesis focuses on this issue,the main research works are summarized as follows:Weight coefficient is introduced to weaken the impact of unlabeled data.When the model assumed does not match with the data distributions,the predictions of unlabeled samples are less reliable.Adding these samples is equivalent to adding noise data,which degrades the performance of the learner.In this thesis,we introduce a weight coefficient to reduce the impact of unlabeled data on the model.When the model misspecification occurs,there will be a big difference between the model with the weight coefficient and the model without the weight coefficient.A method is proposed to solve the model misspecification.The introduction of the weighting factor in this paper can only reduce the influence of unlabeled data to a certain extent,but it cannot solve the problem radically,so this paper presents a method to judge and modify the model.For each generated model,we need to determine whether the model is correct or not.If not,we need to modify the model until the data distribution match with the generated model.Model judgment is to compare the difference between the model with weight coefficient and the model without weight coefficient,while model modification is to adjust the number o f clusters used in the kernel clustering process.A semi-supervised learning method based on kernel clustering is proposed.Kernel clustering maps samples from the input space to kernel space,then clusters the unlabeled samples in the kernel space,and it uses a large number of unlabeled samples to explore the regions where the data are densely distributed in the space.According to the idea of clustering hypothesis in semi-supervised learning,the clusters have a higher probabilities belonging to the same category.The clustering centers obtained by clustering are taken as the generated models,and we need to determine whether this model is the final one or not.Based on the above discussions,a semi-supervised learning method based on kernel clustering is proposed.This method is applied to multi-class image classification,and achieved good results.Experimental results on data sets PASCAL VOC07 and MIR Flickr show that the semi-supervised learning method based on kernel clustering can achieve a better performance in terms of AP and mAP.
Keywords/Search Tags:Semi-supervised learning, kernel clustering, model misspecification
PDF Full Text Request
Related items