Research On Biased-Based Active Imbalance Learning Algorithm

Posted on:2020-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:Z Ling

Full Text:PDF

GTID:2428330596493888

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Traditional machine learning techniques usually can achieve a desired classification performance in a relatively balanced dataset,but in the real world,the distribution of data is usually imbalanced,and traditional classification algorithms often appear bias phenomenon in such datasets.So they can not get a better classification performance,and its classification model will completely fail in severe cases.In order to address the imbalanced problem,the existing oversampling methods mostly use the idea of synthesizing virtual instances to achieve the relative balance between majority class and minority class.But they usually face many drawbacks such as noise expansion,deviation distribution and overfitting.Based on this,this paper starts from the perspective of active learning and expects to use real and valuable unlabeled instances to carry out imbalance learning.Therefore,the research content of this paper is divided into the following aspects:1.Aiming at the limitations of existing oversampling algorithms and active learning algorithms which have been applied to imbalanced datasets,a novel biased-based active sampling learning algorithm is proposed for the first time.The algorithm combines two important sampling factors with minority confidence and instances' informativeness in the sampling process.2.In this paper,the minority confidence problem is formalized into a semi-supervised learning problem.We first propose sparse neighborhood graph to replace the traditional k-nearest neighbor graph,which solves the under-propagation or over-propagation problems caused by improper k-neighbor selection in the traditional semi-supervised learning process.At the same time it improves the sampling accuracy of minority instances,and reduces the labelling cost.3.For the second sub-problem,namely,evaluating instances' informativeness.We are inspired by the MWMOTE algorithm,first propose an auxiliary decision boundary construction strategy for the imbalanced dataset,and then estimate the instances' informativeness based on the nearest distance from the instance to the decision boundary.The auxiliary decision boundary overcomes the limitations of boundary definition compared with the existing active learning algorithms and oversampling algorithms,and effectively improves the accuracy of the informativeness estimation by the imbalance learning algorithm.4.Finally,the experimental results of the proposed algorithm prove that the proposed algorithm not only has the highest efficiency for labeling minority class in the active sampling process,but also has the highest classification performance of the training dataset after sampling.In addition,in the case of extreme imbalance,the algorithm can still achieve better classification results.

Keywords/Search Tags:

Imbalance Learning, Active Learning, Sparse Neighborhood, Label Propagation, Auxiliary Decision Boundary

PDF Full Text Request

Related items

1	Research On Normalized Label Propagation Algorithm Dealing With Label Imbalance
2	Contributions To Several Issues Of Multi-Label Learning
3	Appearance Modeling For Visual Object Tracking
4	Multi-label Learning Based On Neighborhood Models
5	Imbalanced Multi-label Learning Algorithm Based On Density Label Space
6	Multi-label Learning Based On Label Weight And Weighted Kernel Extreme Learning Machine
7	Research On The Utilization Techniques Of Partial Label Data
8	Research On Multi-label Active Learning Under Weak Labeled Condition
9	Research On Partially Labeled Problem Based On Active Learning And Semi-supervised Mechanism
10	Safe And High Efficient Label Propagation Algorithm