Research On Adaboost Improved Algorithm For Unbalanced Data

Posted on:2022-05-13

Degree:Master

Type:Thesis

Country:China

Candidate:J R Yan

Full Text:PDF

GTID:2518306509465364

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Classification is an important branch in the field of data mining,Common classification models usually assume that there is a small difference in the number of samples of each category in the data set,and the cost of misclassification is equal for each other.However,training the traditional classifier with unbalanced data sets will lead to low prediction accuracy of the model for a few classes.So the unbalanced data classification problem has always been a hot research topic in the field of machine learning.This paper studies classification methods for unbalanced data,introduces undersampling method based on sample weight,sample local density calculation method and the calculation method of misclassification cost of samples,and proposes three improved Adaboost algorithms for unbalanced data.The main work of this paper is as follows:(1)USCBoost(Undersampling and Cost-sensitive Boosting),an undersampling and cost-sensitive unbalanced data classification algorithm is proposed.The algorithm aims to undersample most class samples,and Cost matrix is introduced into weight update formula,Boosting the weight increase of sample of misclassified minority classes faster.Experimental results show that compared with other algorithms,the F1-measure and G-mean values of USCBOOST algorithm are significantly improved,and the proposed algorithm is feasible to deal with the classification of unbalanced data.(2)An Adaboost algorithm based on sample density is proposed.In the algorithm,the local density of each sample is calculated by using the k-nearest neighbor of the sample.The local density of the two kinds of samples is normalized respectively,and the weight of each sample is given,and then used as the initial value in the AdaBoost algorithm.At the same time,the experimental verification of the algorithm proposed in this paper shows that the algorithm has a better ability to identify a few minority classes.(3)An AdaCost algorithm based on isolated forest is proposed,algorithm using isolated forests to get abnormal scores of each sample,and then according to the abnormal scores to calculate the misclassification cost error of each sample,The algorithm calculates the misclassification costs of the two types of samples respectively and then normalizes them so that the sum of the misclassification costs of each type of samples is 1,which effectively distinguishes the in-class samples and inter-class samples and reduces the impact of noise data.(4)An imbalanced data classification system based on ensemble learning is designed and implemented.The system integrates multiple ensemble classification algorithms and base classifier algorithms for imbalanced data,including data set description,parameter setting,classification algorithm selection,and result module,it is convenient for users to choose a more appropriate classification algorithm and improve the efficiency of parameter adjustment for the classification algorithm when modeling unbalanced data.

Keywords/Search Tags:

Unbalanced Data, Classification, Ensemble Learning, AdaBoost, Sample Density, Isolated Forest

PDF Full Text Request

Related items

1	The Application Of Ensemble Classification On Unbalanced Data In Bank Marketing
2	Categories Of Unbalanced Data Integration Classification Research
3	The SVM Algorithm And Its Application Based Data Preprocessing In The Kernel Space For Unbalanced Data
4	Classification And Application Of Ensemble Learning In Unbalanced Data
5	Research On SVM Classification Of Unbalanced Data And Its Application In Identify Poor Students In Colleges And Universities
6	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
7	Research On Classification Algorithms For Unbalanced Data
8	Research On Oversampling Ensemble Learning Algorithm For Unbalanced Classification
9	Research On Improved Naive Bayes Classification Model For Imbalanced E-commerce Review Text
10	Research On English Text Classification Algorithm Based On Ensemble Learning