Font Size: a A A

A Cost-sensitive Semi-supervised Learning Model Based On Uncertainty

Posted on:2019-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ZhuFull Text:PDF
GTID:2428330566961586Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of big data era,semi-supervised learning has been paid more and more attention by scholars.Semi supervised learning is a learning method combining supervised learning with unsupervised learning,and its training set uses lots of unlabeled data as well as labeled data.Semi-supervised learning is often used in classification.The major consideration in traditional classification learning is how to improve the classification accuracy and it ignores the loss caused by misclassification.However,under real conditions,different categories often produce different degree of loss,i.e.,cost-sensitive.Therefore,based on the uncertainty of data sample output,this paper proposes a data sample retraining model based on semi-supervised learning framework for cost-sensitive classification problems,which aiming to reduce the total cost of misclassification.This paper mainly includes the following two parts:First,we propose a cost-sensitive semi-supervised learning model based on uncertainty.After training a basic cost-sensitive classifier,i.e.,extreme learning machine(ELM),according to certain characteristics between the uncertainty of data sample output and the misclassification cost,we can define data samples that are not easily misclassified as high-reliability data,and those data samples that are more easily misclassified as low-reliability data.Then,we can select some data samples of testing set with low uncertainty and their predict labels as the high-reliability data sample.Based on the semi-supervised learning framework,we can obtain a a new classification learning model by retrain the classifier using a new training set which is including the original training set data and high-reliability data.The experimental results show that compared with the previous classification model,the total cost of misclassification is significantly reduced.So this method,from another perspective,improves the performance of the classifier.Second,we construct a three-way decision model based on uncertainty to support the rationality of using data samples with low uncertainty as high-reliability data forretraining.After training a basic cost-sensitive classifier,i.e.,ELM,the three-way decision model is used to decide the prediction results of the classifier.The data samples with different degree of uncertainty were made different decisions.Besides,some data samples with high uncertainty are made delayed decision.For the delayed decision area,we believe that it is currently lacking enough information to make decisions.And it can be decided in the future when we get some new information.Experiments show that the classifier using three-way decision model can effectively reduce the total cost of misclassification.Therefore,it indicates that the data with high uncertainty can significantly affect the classification performance.In another aspect,this also verify the sanity of retraining model using low uncertainty data samples as high-reliability data.
Keywords/Search Tags:Cost Sensitive, Uncertainty, Semi-supervised Learning, Extreme Learning Machine(ELM), Three-way Decision
PDF Full Text Request
Related items