Font Size: a A A

Research On Modeling Of Imbalanced Data Granulation Learning Machine

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q DaiFull Text:PDF
GTID:2428330614955358Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Imbalanced data processing is an important research area in the field of data mining.Because the number of inter-class samples of imbalance data is seriously imbalance,and the traditional classification algorithm pays attention to the overall accuracy of the data,the recognition accuracy of a few types of samples is low.When the data imbalance is relatively large,the traditional model is difficult to improve the classification accuracy of a few types of samples,and there are generally problems such as long computing time and high computational cost.Therefore,for the structural characteristics of imbalance data,it is of great practical significance to study algorithms that can improve the recognition accuracy of a small number of samples and reduce the data size.Data granulation is taken as an effective method to reduce data dimension,and the different granulation methods are combined with the learning machine as the classification tool to reduce the data dimension and improve the recognition accuracy of a few classes.Some new granulating learning machine modeling methods are proposed.The main contributions in the paper are as follows:1.After the data was granulated,the traditional algorithm still needed to be modeled and studied on all the granular layers,and the calculation time was long.The study explored a method based on the granular computing learning machine model to obtain the optimal granular layer and improved the learning efficiency of the algorithm.2.The problem of using the Tomek-Link method to eliminate boundary samples was small and could not effectively balance the data structure.Based on the Tomek-Link method to eliminate the boundary points,an improved granulation algorithm based on Tomek-Link was proposed and the model was built to overcome the problem that the Tomek-Link algorithm rejected fewer samples.3.Aiming at the problem that the model integration strategy has strong subjectivity in the framework of ensemble learning,the research proposed bagging weighted ensemble classification model that makes the integration strategy more objective and improved the classification performance of the model.4.For the use of the unstable cut-point algorithm to granulate the data set alone,it was easy to delete a small number of samples that were conducive to classification learning,resulting in a problem of reduced classification accuracy.An weighted ensemble classification model based on membrane ensemble learning was proposed.Compared with other integrated classification models,a few samples have higher recognition accuracy.Due to the low classification accuracy of unbalanced data,it is difficult to identify a small number of samples.In the research,the granulation integrated classification learning of data is studied from the aspects of granulation method and integration strategy.Combined with the theory of granular computing,four different granulation learning machine modeling algorithms are proposed.The experimental results on the open imbalanced dataset confirm the feasibility and effectiveness of the classification model,and provide a new research idea for the granulating learning machine modeling research of imbalance data.Figure 25;Table 14;Reference 66...
Keywords/Search Tags:imbalanced data, granular computing, granulation, learning machine, ensemble learning
PDF Full Text Request
Related items