Research On Imbalanced Data Classification Algorithm Based On Zeroth-order Optimization

Posted on:2021-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:L H Zhang

Full Text:PDF

GTID:2428330620470569

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Data classification is an important task in the field of knowledge discovery and data mining.However,most of the classification algorithms are based on the balance of data distribution,which is not suitable for imbalanced data classification.In addition,most of the optimization models are solved by random gradient descent.The calculation principle of this method is to use a single sample gradient approximation instead of the full gradient.Although a certain amount of calculation is reduced,there is a deviation between the single sample gradient and the full gradient,which will inevitably affect the convergence of the algorithm.It can be seen from the above,the research of new optimization model and its efficient algorithm for imbalanced data classification is always a challenge in machine learning field.When traditional SVM algorithm is used to classify imbalanced data,the performance of classification will be degraded.Therefore,in the traditional SVM optimization model,by introducing the margin mean term and combining with the cost sensitive weighting strategy,the influence of imbalanced data distribution on the classification hyperplane is effectively alleviated.For the solution of the optimization model,a zeroth-order optimization algorithm with variance reduction is proposed.By using the zeroth-step estimation to approximate the gradient value,the complex optimization model which is difficult to or cannot be derived can be solved.Moreover,the use of variance reduction strategy accelerates the convergence rate of the algorithm.Sampling is one of the common algorithms used to deal with imbalanced data classification.The classical random under-sampling algorithm is easy to lose the samples with important information.In order to alleviate the shortcomings of random under-sampling algorithm,a new sampling method is proposed,which is not directly sampling data.Firstly,the original distribution information of the majority class samples is obtained by calculating the distance between the majority class samples and hyperplane;then,weighted sampling is carried out according to the distribution of the majority class samples.This sampling method takes more account of the original distribution of data and the role of samples in different locations.Experiments on imbalanced datasets show the effectiveness of the proposed sampling algorithm.To solve the problem of imbalanced data classification,a zeroth-order optimization algorithm based on cost sensitive AdaBoost is proposed.Firstly,a new weighting function is designed,in which the distribution and error rate of imbalanced data are considered,which is used for adaptive weighting of samples in AdaBoost algorithm;then,the improved SVM optimization model is used as the base classifier,and the base classifier with accuracy and geometric mean greater than 0.5 is selected,which effectively guarantees the ability of the base classifier to handle imbalanced data classification.Comparative experiments on imbalanced datasets show that the zeroth-order optimization algorithm based on cost sensitive Adaboost is superior to other comparison methods.

Keywords/Search Tags:

Imbalanced Datasets, Zeroth-order Optimization, Random Gradient Descent, Support Vector Machine, Under-sampling, Adaboost

PDF Full Text Request

Related items

1	Research On Robust Support Vector Machines Based On Zeroth-Order Optimization
2	Classification Methods For Class-imbalanced Datasets Of Unequal Misclassification Costs And Their Applications
3	Research On Support Vector Machine Classification Method For Imbalanced Datasets
4	Imbalanced Stochastic Gradient Descent Online Algorithm For SVM
5	Classification Algorithm Of Unbalanced Datasets
6	Support Vector Machine Based Classification Algorithms Research For Imbalanced Data
7	Research On Support Vector Machines For Imbalanced Datasets And Incremental Learning
8	The W-SVM Model For Imbalanced Datasets
9	Research And Applications On Intrusion Detection Based On Support Vector Machines For Imbalanced Datasets
10	Research On Support Vector Machine Based On Improved Loss Function