Font Size: a A A

Dual Autoencoders Features For Imbalance Classification Problems

Posted on:2019-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:G J ZengFull Text:PDF
GTID:2428330566486597Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of the artificial intelligence researches and applications,machine learning has been applied widely in various fields of our lives and production to improve people's quality of life and productivity.In the process of applying machine learning,the distribution of samples in different classes is often imbalanced which is called imbalanced problem.Owing to the imbalanced distribution of samples,the classifier using the minimum average loss over all classes as the optimization goal tends to predict samples to be belonging to the majority class.This misleading skewed classification reduces the practical value of the classifier.Currently,methods for relieving the imbalanced classification problems mainly include resampling,ensemble learning,and cost-based sensitivity methods.Resampling-based methods are simple and independent of classifiers,but their randomness is quite large.Moreover,they may easily result in the loss of important sample information or overlapping in the sample space.Combining resampling and ensemble learning methods reduces the disturbance caused by the resampling and improves the classification performance on imbalanced datasets.The cost-sensitive method is simple and intuitive,but it is difficult to define a suitable cost loss function.In fact,classifiers can also obtain good results in imbalanced data where the data distribution boundary is clear.For imbalanced data with sample space overlapping and unobvious features,feature learning provides a better decision boundary to improve the classification.Therefore,resampling to rebalance the data is not necessary and imbalanced classification problems can be relieved from the perspective of features extraction.This dissertation relieves the imbalanced data classification from the point of view of the features.We propose the dual autoencoders features method which generates two sets of features by two independent stacked autoencoders with different activation functions.The two sets of features obtained by two stacked autoencoders with different activation functions encode and capture the global,stable,local,and detailed features of the original data.The two sets of features with different characteristics are then combined to yield better expressiveness.The features of samples generated by the dual autoencoders are converted from the original feature space to a new feature space,so that a more reasonable classification boundary can be learned easily by the classifier.Four experiments are conducted to compare the features generated by single stacked autoencoder with single activation function,dual autoencoders features,other resampling-based and ensemble-based learning algorithms,and features conversion methods by their classification boundary and performances in both artificial datasets and 14 UCI datasets.These experiments verify that the dual autoencoders features can obtain better classification boundary and classification performance under the imbalanced data environment.
Keywords/Search Tags:Imbalanced data, pattern recognition, autoencoder, features learning
PDF Full Text Request
Related items