Font Size: a A A

A Symmetric Flipping Algorithm Research For Imbalanced Datasets Based On GMM-EM

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:L J WangFull Text:PDF
GTID:2428330602489010Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The classification of imbalanced data is an important research direction in machine learning and data mining.In real life,imbalanced data are widespread,and a lot of useful information exists in a small "number of sample data,so the classification problem of imbalanced data becomes more important.Due to the proportion of minority class samples is small in imbalanced data.Therefore,the traditional classifier does not perform well in classifying imbalanced data,which easily leads to the classification error of minority class samples.The existing improved algorithms can improve the classification effect of imbalanced data in different levels,however,these algorithms have certain disadvantages.On the one hand,the statistical characteristics of imbalanced data are ignored,and randomly generated samples easily overlap with majority class;on the other hand,because the generating direction is not considered which generating data,the new samples have poor quality.In view of the above problems,this paper considers the statistical characteristics of the imbalanced data set and the generating direction of data,and pays more attention to the distribution of the minority class in the choice of the original data for generating new samples.A balanced data inversion algorithm was proposed for balanced data and data experiments were performed to verify the effectiveness of the new algorithm.Firstly,the density functions of the minority class and majority class are obtained by using GMM and EM algorithm,and obtain the mean and variance of the majority class and minority class data;Secondly,the mean value of the minority class as the center of symmetry.Determining the flipping boundary that majority classes invade minority classes according to"3? rule" of statistics,and flipping area is determined based on the inverting region after inverting process.The symmetric transformation of minority class are operated after obtaining centers and radius of inverting region,repetitive points of the original data of minority class are eliminated.At this moment,if the two types of data are imbalanced,the minority class samples are generated by using the probability density enhancing method.The new algorithm considers the generating direction of new samples and avoids the overlap between the majority class and minority class data to balance the data from a statistical level,and improves the quality of minority class data;At last,this paper apply new algorithm and ADASYN and SMOTE-related methods together with decision tree classifier for assessment.The paper choose imbalanced datasets from UCI and KEEL repositories.The experimental results show that feasibility and effectiveness of the algorithm.
Keywords/Search Tags:Imbalanced Datasets, Data Classification, GMM-EM Algorithm, Symmetric Inverting
PDF Full Text Request
Related items