Font Size: a A A

Research On Imbalanced Data Classification Methods Based On Ensemble Learning

Posted on:2022-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:K C HouFull Text:PDF
GTID:2518306611995909Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Artificial intelligence based on machine learning is attracting much attention and influencing people's lives constantly.As an important tool,machine learning has been successfully applied in material performance prediction,network intrusion analysis,medical detection and other fields.In this process,we often face to many situation i.e.some datasets are much larger than others.This problem is usually called the data imbalance.In the problem,the distribution of the data is sparse and imbalance.At present,many models do not work well when they process the imbalanced datas.For example,standard support vector machine is used to deal with imbalance data,the majority of classes are paid more attentions than ones.In this case,the decision boundary biased to the majority classes and classification effect is not ideal.However,in the face of practical problems,since minority classes often have more significant performance,it is required to be evaluated standard to distinguish classes accurately.In this paper,considering the difficulty and complexity of imbalanced data classification,based on support vector machine(SVM)and ensemble learning framework,a new imbalanced data classification algorithm is proposed.As follows:Firstly,SVM is improved to SVM with Gaussian kernel function in the construction of base classifier,and the standard integration algorithm is improved to cost sensitive ensemlble algorithm.Secondly,by modifying the misclassification cost of the majority class and minority class data,the weight of the majority class and minority class data is balanced.Therefore decision boundary does not bias to the majority class to achieve good classification performance.Finally,in the experimental analysis stage,11 data sets from UCI database are selected to evaluate the proposed algorithm form accuracy,recall rate and G-MEAN.The performance of our algorithm is compared with other existing classification ones.Experimental results show that the proposed algorithm of our classification performance works well.It is believed that our work of this paper will make positive effects to the further study of imbalanced data classification.
Keywords/Search Tags:Imbalanced datas, Support vector machines, Ensemble learning, Cost-sensitive, Weight balance
PDF Full Text Request
Related items