Research On Machine Learning Techniques Based On Sampling Algorithms

Posted on:2020-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:H P Zhang

Full Text:PDF

GTID:2428330578964134

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Sampling theory is the basic theory of many disciplines.By sampling method,we can not only obtain the approximate solution under the weak condition of accurate inference,but also accelerate the calculation based on the sampling method.In the Big Data times,the sampling method has been more widely used.The problem of data imbalance is a classic problem in the field of machine learning,and the solution idea based on data level includes data oversampling and data undersampling.Data-driven models will encounter more imbalanced data in the face of problems in real-time scenarios.The main research work of this paper is as follows:Firstly,this paper studies the influence of data imbalance on the evaluation of classification models in machine learning,and by means of visualization,it is proved by experiments that data imbalance will bring negative effect to model learning.Aiming at two kinds of ideas to solve the problem of data unbalanced learning from the data level,oversampling method and undersampling method,the validity of sampling method is proved from the angles of theoretical analysis and comparative experiment respectively,which lays a theoretical and experimental foundation for the follow-up research work.Secondly,some research has introduced evolution idea into sampling algorithms,and related algorithms combined with adaptive Lévy distribution are proposed.This paper improves the evolutionary sampling algorithm based on Lévy distribution.By setting the parameter α of this distribution to 1.0,1.3,1.7,2.0,corresponding to the four transition probability distributions,the diversity of the generated candidate samples is increased.Theoretical analysis and experimental results show that the proposed algorithm is superior to the evolutionary sampling algorithm based on Gaussian distribution,Cauchy distribution,symmetrical exponential distribution and other adaptive evolutionary sampling algorithms in terms of convergence rate and accuracy.Thirdly,for oversampling problems on imbalanced data sets,after thorough analysis based on the distribution of the Lévy sampling method,the choice of sampling rate generation function does not necessarily have to be Lévy distribution,therefore,data sampling methods based on the Gaussian distribution and piecewise distribution are proposed.The density of new samples synthetized from the borderlines is the largest,the density of new samples synthetized from the samples closer to the majority is the second largest,and the density of new samples synthetized from the samples closer to the minority is the smallest.Thus,this approach can enhance the decision boundary and reduce the noise generation.Experiments on multiple datasets show that the proposed approach can effectively improve the classification results on imbalanced datasets.

Keywords/Search Tags:

Sampling algorithm, Lévy distribution, Imbalanced learning

PDF Full Text Request

Related items

1	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
2	Research On Imbalanced Data Classification Algorithms Based On Weight Analysis Of Loss Function
3	Research On Imbalanced Dataset Classification Based On Oversampling Technique
4	Research On Classification Method For Imbalanced Datasets
5	Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method
6	Research On Imbalanced Data Classification Learning Algorithm Based On Mixed Sampling Technique And Adaboost Principle
7	Imbalanced Learning And Its Application Based On Manifold Embedded Over-sampling
8	Research On Hybrid Sampling Of Imbalanced Data Based On Data Distribution
9	Research On Imbalanced Dataset Classification Algorithm Based On Sampling
10	Research On Oversampling Algorithms For Imbalanced Learning