Font Size: a A A

Research On Normalized Label Propagation Algorithm Dealing With Label Imbalance

Posted on:2014-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:M XueFull Text:PDF
GTID:2248330395499161Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, research for classification issues has been presented explosive growth in science and technology areas. Semi-supervised learning flourishes as it can circumvent the limitations of unsupervised learning and supervised learning. Graph-based semi-supervised learning method which is built on graph theory of the mathematical theory is of great importance to the international machine learning and data mining community because of its reasonably abstract model, intuitive form of expression and strong comprehensibility.During the process of using semi-supervised learning algorithm, classification methods mostly assume that the data set and labeled data are distributed balanced. However, in practical applications, unbalance can be found easily and will affect the performance of semi-supervised classification, resulting in reducing its classification accuracy. How to solve the imbalance problem of labeled data of classification becomes the main topic of this thesis.This paper analyzed the status quo of graph-based semi-supervised learning algorithm first, elaborated the importance of cluster assumption and smoothness assumption to graph-based semi-supervised learning. Analyzed the classic algorithms combined with graph structure.Next, stated the current situation of imbalance. And focused on the problem-imbalance of labeled data. Listed some algorithms which can handle imbalance problem such as Label Diagnosis through Self Tuning (LDST)and several other methods. On this basis, by comparing several classical algorithms, this paper proposed an improvement of classical method in the field of semi-supervised learning which is called Label Propagation(LP). The new algorithm is called Normalized Label Propagation(NLP) which is based on Naive Bayes theory, balance the information each labeled data carried during the process of passing labels.In the experiment,this article has carried contrast experiments separately on the IRIS, the WINE as well as the genetic GENE date set to test the ability to handle label imbalance problems, through the contrast experiment, we found that NLP shows great adaptability and classification accuracy.
Keywords/Search Tags:Graph-based Semi-supervised Learning, Label Imbalance, NormalizedLabel Propagation, Naive Bayes Theory
PDF Full Text Request
Related items