Font Size: a A A

Imbalanced Learning And Its Application Based On Manifold Embedded Over-sampling

Posted on:2019-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:L K YangFull Text:PDF
GTID:2428330566463485Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Imbalanced data sets are very common in real world applications and simulation situations.The imbalance means that there is a huge difference in the number of different categories of samples.For instance,in a data set,there are 1000 majority samples and only 10 minority samples.The class imbalance can cause difficulties for classification problems,and it is very common in industrial applications.Therefore,it is necessary to pay attention to the class imbalance classification problem.Over-sampling algorithms are popular to deal with imbalanced problems.These methods generate new samples for the minority class to build a balanced data set,and then train the classifier based on the balanced data set.However,most of previous over-sampling methods do not consider the non-linear distribution of observation data set.Therefore,the linear interpolation method may lead to the problem that the newly produced minority samples are not satisfying the structure of original observation space data set.In this paper,manifold learning methods are employed to explore the essential structure of the observation space data set,and then,over-sampling algorithms are used to establish a balanced one based on this structure to improve the quality of generated minority samples of non-linear data sets.1.In order to improve classifier's performance in the case of industrial fault diagnosis,a manifold embedded over-sampling framework is proposed.We first show that this framework can ensure that the newly generated minority samples satisfy the structure of the original data set.Then,six manifold learning methods and four over-sampling methods are tested based on TE Process data set,Barcelona Water System data set,and Xing Long Zhuang coalmine belt system data set.Experimental results show that the manifold embedded algorithm can improve the quality of generated data and the classification performance.2.A semi-supervised over-sampling framework is designed to diagnose the rock burst disaster in coal mines.Firstly,principal component analysis(PCA),linear discriminant analysis(LDA)and other manifold learning methods are used to find the essential structure of the imbalanced micro-seismic data set.This step not only extracts the features,but also compresses the number of samples.Then,over-sampling methods are applied in the feature space to build a balanced data set,and semi-supervised learning methods are used to add more credible labels for the new generated minority samples.Finally,six different classifiers are assessed.3.In order to improve the performance of over-sampling methods in solving non-linear imbalanced classification problem,a locally linear interpolation method,inspired by locally linear embedding(LLE)is proposed.This over-sampling method contains two main steps,over-sampling based on the whole data set and labeling the new generated samples.Firstly,using the idea of locally linear embedding,the k nearest neighbors of the central sample are used to reconstruct the central data to obtain the weight matrix that contains the structural information.Then based on the weight matrix,new minority samples are generated.In another word,this method uses structural information of the whole data set to generate samples.Finally,labeling the new generated samples.Since the labels of the original data are not taken into account in over-sampling process,so this method can deal with multi-class imbalanced classification problem.Due to the reason that this method considers the structural information of the data set,so the quality of the new generated data of non-linear data set can be improved.In addition,kernel method can be used to improve the performance of this method.Experiments on four UCI data sets demonstrate the effectiveness of this method.
Keywords/Search Tags:imbalanced data sets, over-sampling, manifold learning, fault diagnose
PDF Full Text Request
Related items