Research On The Problem Of Semi-supervised Model Misspecification Based On Kernel Clustering

Posted on:2018-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2348330533469223

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Semi-supervised learning is an important research issue in machine learning.As is well known,a large number of labeled samples can help improve the performance of the learner,but the collection of a large number of labeled samples is timeconsuming and labor-intensive.In semi-supervised learning,a small number of labeled samples as well as a large number of unlabeled samples are mixed together for training,and thus semi-supervised learning has received more and more research attention.However,when the initial model assumed does not match with the underlying data distribution,adding unlabeled data for training models will reduce the performance of the learner,we call this issue model misspecification.This thesis focuses on this issue,the main research works are summarized as follows:Weight coefficient is introduced to weaken the impact of unlabeled data.When the model assumed does not match with the data distributions,the predictions of unlabeled samples are less reliable.Adding these samples is equivalent to adding noise data,which degrades the performance of the learner.In this thesis,we introduce a weight coefficient to reduce the impact of unlabeled data on the model.When the model misspecification occurs,there will be a big difference between the model with the weight coefficient and the model without the weight coefficient.A method is proposed to solve the model misspecification.The introduction of the weighting factor in this paper can only reduce the influence of unlabeled data to a certain extent,but it cannot solve the problem radically,so this paper presents a method to judge and modify the model.For each generated model,we need to determine whether the model is correct or not.If not,we need to modify the model until the data distribution match with the generated model.Model judgment is to compare the difference between the model with weight coefficient and the model without weight coefficient,while model modification is to adjust the number o f clusters used in the kernel clustering process.A semi-supervised learning method based on kernel clustering is proposed.Kernel clustering maps samples from the input space to kernel space,then clusters the unlabeled samples in the kernel space,and it uses a large number of unlabeled samples to explore the regions where the data are densely distributed in the space.According to the idea of clustering hypothesis in semi-supervised learning,the clusters have a higher probabilities belonging to the same category.The clustering centers obtained by clustering are taken as the generated models,and we need to determine whether this model is the final one or not.Based on the above discussions,a semi-supervised learning method based on kernel clustering is proposed.This method is applied to multi-class image classification,and achieved good results.Experimental results on data sets PASCAL VOC07 and MIR Flickr show that the semi-supervised learning method based on kernel clustering can achieve a better performance in terms of AP and mAP.

Keywords/Search Tags:

Semi-supervised learning, kernel clustering, model misspecification

PDF Full Text Request

Related items

1	Research On Semi-supervised Clustering And Classification Algorithm
2	Research On SVM Method Based On Semi-supervised Clustering Nucleus
3	Semi-supervised Graph Clustering With Spatial-spectral Kernel And Its Application In Hyperspectral Image
4	Semi-supervised Learning On Text Data
5	Semi-Supervised Clustering And Dimensionality Reduction With Their Applications
6	Multiple Kernel Learning Improved By Bi-objective Functions And Its Application To Semi-supervised Learning And Transfer Learning
7	Distributed Clustering And Evolutionary Clustering Algorithm Based On Semi-supervised Learning
8	Study Of Semi-supervised Based On Kernel
9	Semi-supervised Feature Selection Based On Kernel Density Estimation
10	Research And Application Of Manifold Regularization Multiple Kernel Model On Supervised And Semi-supervised Classification