Classification On Imbalanced Datasets

Posted on:2020-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:Q Y Shi

Full Text:PDF

GTID:2428330623456214

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the explosive growth of various data,the imbalance's trend and the increasing trend of dimension on massive data is more and more obvious,which seriously reduces the accuracy of classification.The original classification algorithm has good classification performance under the premise of balanced data set,but the classification performance will be seriously reduced if data is imbalanced.Therefore,it is an urgent problem to improve the classification performance of imbalanced data.Aiming at the problem of classification bias caused by imbalanced data,the problem of reducing the accuracy of classification and increasing the difficulty of classification caused by high dimension of data,this paper do a lot of study.The main research contents are as follows.:1.An imbalanced data classification model combined with variational autoencoder is proposed.The model learns the distribution characteristics of samples closer to the real data through multiple non-linear feature transformations of the neural network which takes into account the characteristics of minority class with the help of variational auto-encoder.Then,the generator of variational auto-encoder is used to generate samples which are more in line with the original data characteristics to balance the training data set.The model solves the limitations of traditional over-sampling which is difficult to approach the real data and the problem of classification over-fitting.2.A high-dimensional imbalanced data classification model based on improved denoising auto-encoder is proposed.The model introduces a new noise function according to the imbalance,which changes the input data to differentiate the majority samples from the minority samples in the step of adding noise on the original denoising self-encoder.By damaging a few samples through noise layer to get high attention in training process,the model solves the problem of invalidity of feature extraction caused by the imbalance of positive and negative samples and reduces the classification error caused by high dimensionality of data.3.An imbalanced emotional classification model is constructed.The model preprocesses the acquired data using word segmentation,deactivation and training text vectors Firstly.Then,the imbalanced text vectors are processed with VAE oversampling and improved denoising auto-encoder.The experiments results show that the two algorithms effectively improves the performance degradation of the classifer due to the imbalanced data.

Keywords/Search Tags:

Imbalanced dataset, Classification, Variational autoencoders, Oversampling, Denosing auto-encoder

PDF Full Text Request

Related items

1	Research On Imbalanced Dataset Classification Based On Oversampling Technique
2	Research On Imbalanced Classification Algorithm Based On Generative Model
3	Research On Classification Algorithm For Imbalanced Data
4	Research And Application Of Representation Learning Based On Variational Auto-encoder
5	Application Research Of Used-car Recommendation Based On Classification Method On Imbalanced Data Sets
6	Research On Imbalanced Dataset Classification Algorithm Based On Sampling
7	Research On Generation And Classification Methods Of Unbalanced Samples In Abnormal Traffic Detection
8	Non-parallel Voice Conversion Using ACGAN And Variational Autoencoders Conditioned By Sentence Embedding
9	Deep Auto-encoder Framework For SAR Images Change Detection
10	Research On Oversampling Algorithm Of Unbalanced Data Set