Font Size: a A A

Research On Imbalanced Data Classification Algorithm Based On Zeroth-order Optimization

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:L H ZhangFull Text:PDF
GTID:2428330620470569Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data classification is an important task in the field of knowledge discovery and data mining.However,most of the classification algorithms are based on the balance of data distribution,which is not suitable for imbalanced data classification.In addition,most of the optimization models are solved by random gradient descent.The calculation principle of this method is to use a single sample gradient approximation instead of the full gradient.Although a certain amount of calculation is reduced,there is a deviation between the single sample gradient and the full gradient,which will inevitably affect the convergence of the algorithm.It can be seen from the above,the research of new optimization model and its efficient algorithm for imbalanced data classification is always a challenge in machine learning field.When traditional SVM algorithm is used to classify imbalanced data,the performance of classification will be degraded.Therefore,in the traditional SVM optimization model,by introducing the margin mean term and combining with the cost sensitive weighting strategy,the influence of imbalanced data distribution on the classification hyperplane is effectively alleviated.For the solution of the optimization model,a zeroth-order optimization algorithm with variance reduction is proposed.By using the zeroth-step estimation to approximate the gradient value,the complex optimization model which is difficult to or cannot be derived can be solved.Moreover,the use of variance reduction strategy accelerates the convergence rate of the algorithm.Sampling is one of the common algorithms used to deal with imbalanced data classification.The classical random under-sampling algorithm is easy to lose the samples with important information.In order to alleviate the shortcomings of random under-sampling algorithm,a new sampling method is proposed,which is not directly sampling data.Firstly,the original distribution information of the majority class samples is obtained by calculating the distance between the majority class samples and hyperplane;then,weighted sampling is carried out according to the distribution of the majority class samples.This sampling method takes more account of the original distribution of data and the role of samples in different locations.Experiments on imbalanced datasets show the effectiveness of the proposed sampling algorithm.To solve the problem of imbalanced data classification,a zeroth-order optimization algorithm based on cost sensitive AdaBoost is proposed.Firstly,a new weighting function is designed,in which the distribution and error rate of imbalanced data are considered,which is used for adaptive weighting of samples in AdaBoost algorithm;then,the improved SVM optimization model is used as the base classifier,and the base classifier with accuracy and geometric mean greater than 0.5 is selected,which effectively guarantees the ability of the base classifier to handle imbalanced data classification.Comparative experiments on imbalanced datasets show that the zeroth-order optimization algorithm based on cost sensitive Adaboost is superior to other comparison methods.
Keywords/Search Tags:Imbalanced Datasets, Zeroth-order Optimization, Random Gradient Descent, Support Vector Machine, Under-sampling, Adaboost
PDF Full Text Request
Related items