Font Size: a A A

Unbalanced Data Based On AdaBoost-SVM Research On Classification Algorithm Of Sets

Posted on:2023-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:L Y HuangFull Text:PDF
GTID:2558307163495704Subject:Mathematics
Abstract/Summary:PDF Full Text Request
The classification of unbalanced data sets is one of the most common and important problems in the classification.The difference of sample number leads to the inaccurate prediction of minority classes by traditional models.Therefore,improving the classification accuracy of minority classes in unbalanced data sets is the study focus of this research.This thesis aims mainly to find the solution to the bi-classification problem,but the proposed method also performs well on the multi-classification problem.Experimental results show that the proposed method can effectively improve the classification performance and is superior to the original ensemble algorithm.The main work of this research is as follows:(1)Ada Boost algorithm framework was built and SVM algorithm with relatively robust performance was integrated to improve the performance of base classifier in processing unbalanced data sets.(2)By modifying the loss function to update the weight of weak classifier and training sample,the loss cost of misclassified samples can be reduced.It overcomes the problem in Ada Boost classification iteration,that is,weight expansion and distortion of training set data distribution will occur in the later iteration due to higher weight assigned to misclassified samples in the iteration process.Experimental results show that this method enhances the robustness of the algorithm in dealing with unbalanced data sets.(3)An improved algorithm based on Adaboost-SVM is implemented and used to process synthetic data sets and UCI data sets.The UCI data set includes bi-classification data set and multi-classification data set.In the experiment,the algorithm is evaluated by accuracy,confusion matrix and G-means.The results show that the improved algorithm has higher evaluation index value than SVM and Adaboost-SVM algorithm.This shows that the improved algorithm has better classification results for unbalanced data sets.
Keywords/Search Tags:AdaBoost-SVM algorithm, Unbalanced data set, Loss function, Classification
PDF Full Text Request
Related items