Dual Autoencoders Features For Imbalance Classification Problems

Posted on:2019-06-14

Degree:Master

Type:Thesis

Country:China

Candidate:G J Zeng

Full Text:PDF

GTID:2428330566486597

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid growth of the artificial intelligence researches and applications,machine learning has been applied widely in various fields of our lives and production to improve people's quality of life and productivity.In the process of applying machine learning,the distribution of samples in different classes is often imbalanced which is called imbalanced problem.Owing to the imbalanced distribution of samples,the classifier using the minimum average loss over all classes as the optimization goal tends to predict samples to be belonging to the majority class.This misleading skewed classification reduces the practical value of the classifier.Currently,methods for relieving the imbalanced classification problems mainly include resampling,ensemble learning,and cost-based sensitivity methods.Resampling-based methods are simple and independent of classifiers,but their randomness is quite large.Moreover,they may easily result in the loss of important sample information or overlapping in the sample space.Combining resampling and ensemble learning methods reduces the disturbance caused by the resampling and improves the classification performance on imbalanced datasets.The cost-sensitive method is simple and intuitive,but it is difficult to define a suitable cost loss function.In fact,classifiers can also obtain good results in imbalanced data where the data distribution boundary is clear.For imbalanced data with sample space overlapping and unobvious features,feature learning provides a better decision boundary to improve the classification.Therefore,resampling to rebalance the data is not necessary and imbalanced classification problems can be relieved from the perspective of features extraction.This dissertation relieves the imbalanced data classification from the point of view of the features.We propose the dual autoencoders features method which generates two sets of features by two independent stacked autoencoders with different activation functions.The two sets of features obtained by two stacked autoencoders with different activation functions encode and capture the global,stable,local,and detailed features of the original data.The two sets of features with different characteristics are then combined to yield better expressiveness.The features of samples generated by the dual autoencoders are converted from the original feature space to a new feature space,so that a more reasonable classification boundary can be learned easily by the classifier.Four experiments are conducted to compare the features generated by single stacked autoencoder with single activation function,dual autoencoders features,other resampling-based and ensemble-based learning algorithms,and features conversion methods by their classification boundary and performances in both artificial datasets and 14 UCI datasets.These experiments verify that the dual autoencoders features can obtain better classification boundary and classification performance under the imbalanced data environment.

Keywords/Search Tags:

Imbalanced data, pattern recognition, autoencoder, features learning

PDF Full Text Request

Related items

1	The Imbalanced Learning Method Of Optimal Margin Distribution Machine And Its Industrial Application
2	Research On Imbalanced Data Classification Method Based On Generation Model And Its Application
3	Proxy Relearning for Feature-Driven Pattern Recognition in High-Dimensional Imbalanced Time Series Data Set
4	Research On Autoencoder Based Unsupervised Learning Algorithms And Applications
5	Research On Stack Hybrid Autoencoder And Transfer Learning In Facial Expression Recognition
6	Research On Recognition Method For Imbalanced Multivariate Mouse Trajectory With Variable Length
7	Class-imbalanced Learning Based On Data Smoothing
8	Research On Imbalanced Data Augmentation And Imbalanced Classification Based On Auto-Encoder
9	Research On Adaptive Imbalanced Data Classification
10	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets