Research Of Fuzzy Classification Ensemble Method For Imbalanced Data

Posted on:2023-12-11

Degree:Master

Type:Thesis

Country:China

Candidate:Z Zhang

Full Text:PDF

GTID:2568306794955099

Subject:Software engineering

Abstract/Summary:

TSK fuzzy model is one of the most influential and widely used fuzzy models.It has been successfully applied in many fields with high interpretability and strong approximation ability.However,in practice,the performance of fuzzy model depends on the quantity and quality of available data,which requires sufficient training to obtain better generalization ability.Ensemble learning provides an effective method for building models.It can usually obtain better generalization performance than individual learners by combining multiple learners to complete the learning task.However,when the data is imbalanced,the training accuracy of the system will be reduced and the generalization ability will be poor.Due to the small number of samples of minority classes,the accuracy of the model is more inclined to the majority classes.However,there are a large number of imbalanced data in reality,and the cost of training errors of these data is very expensive.Therefore,it is very important to improve the training accuracy of minority classes.Imbalanced data refers to the large difference in the number of training samples of different categories,of which the larger number is called the majority class,and the smaller number is called the minority class.In medical,economic,industrial and other classification tasks,the problem of category imbalance exists widely,such as medical disease diagnosis,fraudulent transaction detection,faulty parts detection,natural monitoring,etc.In these cases,when using traditional classification methods,if the data is not manually adjusted for balance,although it is beneficial to the classification of the majority category,it affects the identification of the minority category,and the quality of the classification will be reduced due to missing data.As a result,the classification accuracy is biased towards the majority class,which seriously affects the classification accuracy.Even if the accuracy of the majority class classification is ideal,the accuracy of the minority class samples cannot be guaranteed.Classification problems with imbalanced data,therefore,it is necessary to use sampling methods to solve the problem of imbalanced data.So the main research of this paper is the classification of imbalanced data,and ensemble learning is used to improve the classification accuracy of the classifiers to obtain better generalization performance than the single classifier.In terms of sampling,Synthetic Minority Oversampling Technique(SMOTE)is a classic oversampling algorithm,it generates new samples for the minority class data based on "interpolation" to change the imbalance ratio,but it also has disadvantages,such as samples are generated blindly and randomly,the generated minority class samples are prone to overlap with the surrounding majority class samples,and the minority class samples are distributed unevenly.Based on the above,the SMOTE algorithm has improved.For example,the method of combining smote with undersampling uses smote algorithm to oversampling,and then carries out data cleaning to delete overlapping samples between classes.Furthermore,the samples are divided into multiple clusters using fuzzy Cmeans(FCM)clustering or other clustering algorithms,etc.In terms of ensemble learning,the choice of the base classifier and the integration method is also a key factor affecting the classification method.Based on the above-mentioned various sampling algorithms,this paper chose the Takagi-Sugeno-Kang(TSK)fuzzy model as the base classifier and used the AdaBoost algorithm,finally,the results of each model are weighted and output,so that each base model can be fully trained,according to the selection of sampling algorithm and the optimization of main parameters,relevant experiments are carried out to improve the generalization performance of imbalanced data classification.This paper selects the results of multiple unbalanced datasets on UCI to compare,and selects G-means and Fmeasure as metrics to observe the performance of each model.The experimental results show that the effect of the algorithm proposed in this paper has been improved.

Keywords/Search Tags:

Imbalanced Data, Oversampling, Ensemble Learning, AdaBoost, TSK Fuzzy Model

Related items

1	Research Of Imbalanced Data Classification Method Based On Oversampling And Ensemble Learning
2	Research Of Imbalanced Data Ensemble Classification Algorithm Based On Oversampling
3	Research On Ensemble Classifying Algorithm Of Imbalanced Date Set Based On Oversampling
4	Two-class Imbalanced Data Classification Based On Diverse Data Generation And Ensemble Learning
5	Research On Predictive Maintenance Model For Imbalanced Industrial Data
6	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
7	The Research Of Imbalanced Data Based On Oversampling Technique
8	Research On Credit Evaluation Based On Improved Oversampling Method And Adaptive Ensemble Model
9	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
10	Research On Methods For Imbalanced Data Classification And Applications