Research On Modeling Of Imbalanced Data Granulation Learning Machine

Posted on:2021-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q Dai

Full Text:PDF

GTID:2428330614955358

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Imbalanced data processing is an important research area in the field of data mining.Because the number of inter-class samples of imbalance data is seriously imbalance,and the traditional classification algorithm pays attention to the overall accuracy of the data,the recognition accuracy of a few types of samples is low.When the data imbalance is relatively large,the traditional model is difficult to improve the classification accuracy of a few types of samples,and there are generally problems such as long computing time and high computational cost.Therefore,for the structural characteristics of imbalance data,it is of great practical significance to study algorithms that can improve the recognition accuracy of a small number of samples and reduce the data size.Data granulation is taken as an effective method to reduce data dimension,and the different granulation methods are combined with the learning machine as the classification tool to reduce the data dimension and improve the recognition accuracy of a few classes.Some new granulating learning machine modeling methods are proposed.The main contributions in the paper are as follows:1.After the data was granulated,the traditional algorithm still needed to be modeled and studied on all the granular layers,and the calculation time was long.The study explored a method based on the granular computing learning machine model to obtain the optimal granular layer and improved the learning efficiency of the algorithm.2.The problem of using the Tomek-Link method to eliminate boundary samples was small and could not effectively balance the data structure.Based on the Tomek-Link method to eliminate the boundary points,an improved granulation algorithm based on Tomek-Link was proposed and the model was built to overcome the problem that the Tomek-Link algorithm rejected fewer samples.3.Aiming at the problem that the model integration strategy has strong subjectivity in the framework of ensemble learning,the research proposed bagging weighted ensemble classification model that makes the integration strategy more objective and improved the classification performance of the model.4.For the use of the unstable cut-point algorithm to granulate the data set alone,it was easy to delete a small number of samples that were conducive to classification learning,resulting in a problem of reduced classification accuracy.An weighted ensemble classification model based on membrane ensemble learning was proposed.Compared with other integrated classification models,a few samples have higher recognition accuracy.Due to the low classification accuracy of unbalanced data,it is difficult to identify a small number of samples.In the research,the granulation integrated classification learning of data is studied from the aspects of granulation method and integration strategy.Combined with the theory of granular computing,four different granulation learning machine modeling algorithms are proposed.The experimental results on the open imbalanced dataset confirm the feasibility and effectiveness of the classification model,and provide a new research idea for the granulating learning machine modeling research of imbalance data.Figure 25;Table 14;Reference 66...

Keywords/Search Tags:

imbalanced data, granular computing, granulation, learning machine, ensemble learning

PDF Full Text Request

Related items

1	Hybrid Ensemble Learning For Imbalanced Data
2	Research On Ensemble Learning Algorithm For Imbalanced Data
3	Classification Knowledge Discovery Algorithms Based On Granular Computing And Its Applications
4	Granulation Mechanism And Data Modeling For Complex Data
5	A Research On Imbalanced Learning Based On Semi-supervised SVM
6	Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data
7	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
8	Research On Methods For Classifying Imbalanced Data
9	Researches On Granular Support Vector Machine Learning Approach Based On Multi-dimension Association Rules
10	Two-class Imbalanced Big Data Classification Based On Data Reduction And Ensemble Learning