Font Size: a A A

Research On Biased-Based Active Imbalance Learning Algorithm

Posted on:2020-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z LingFull Text:PDF
GTID:2428330596493888Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditional machine learning techniques usually can achieve a desired classification performance in a relatively balanced dataset,but in the real world,the distribution of data is usually imbalanced,and traditional classification algorithms often appear bias phenomenon in such datasets.So they can not get a better classification performance,and its classification model will completely fail in severe cases.In order to address the imbalanced problem,the existing oversampling methods mostly use the idea of synthesizing virtual instances to achieve the relative balance between majority class and minority class.But they usually face many drawbacks such as noise expansion,deviation distribution and overfitting.Based on this,this paper starts from the perspective of active learning and expects to use real and valuable unlabeled instances to carry out imbalance learning.Therefore,the research content of this paper is divided into the following aspects:1.Aiming at the limitations of existing oversampling algorithms and active learning algorithms which have been applied to imbalanced datasets,a novel biased-based active sampling learning algorithm is proposed for the first time.The algorithm combines two important sampling factors with minority confidence and instances' informativeness in the sampling process.2.In this paper,the minority confidence problem is formalized into a semi-supervised learning problem.We first propose sparse neighborhood graph to replace the traditional k-nearest neighbor graph,which solves the under-propagation or over-propagation problems caused by improper k-neighbor selection in the traditional semi-supervised learning process.At the same time it improves the sampling accuracy of minority instances,and reduces the labelling cost.3.For the second sub-problem,namely,evaluating instances' informativeness.We are inspired by the MWMOTE algorithm,first propose an auxiliary decision boundary construction strategy for the imbalanced dataset,and then estimate the instances' informativeness based on the nearest distance from the instance to the decision boundary.The auxiliary decision boundary overcomes the limitations of boundary definition compared with the existing active learning algorithms and oversampling algorithms,and effectively improves the accuracy of the informativeness estimation by the imbalance learning algorithm.4.Finally,the experimental results of the proposed algorithm prove that the proposed algorithm not only has the highest efficiency for labeling minority class in the active sampling process,but also has the highest classification performance of the training dataset after sampling.In addition,in the case of extreme imbalance,the algorithm can still achieve better classification results.
Keywords/Search Tags:Imbalance Learning, Active Learning, Sparse Neighborhood, Label Propagation, Auxiliary Decision Boundary
PDF Full Text Request
Related items