Font Size: a A A

Research On Imbalanced Data Oversampling Classification Based On Constructive Covering Algorithm

Posted on:2020-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z B WuFull Text:PDF
GTID:2428330575965359Subject:Engineering
Abstract/Summary:PDF Full Text Request
Along with the rapid development of information technology,how to extract valuable information from the large amount of data generated and stored in various fields has become a great challenge.In many fields,the data that is useful to people is often very scarce.And there is a class of data set known as the imbalanced data set,this data set the number of a class of data is far greater than the number of another type of data.Imbalanced data classification is an important branch of data mining.Traditional classification algorithms of machine learning are based on the overall classification accuracy as the learning objectives and its poor performance in the imbalanced data sets classification results.Imbalanced data classification algorithm needs to seek new classification methods and discriminant criteria,so the research on the imbalanced data classification algorithm has gradually become a hot topic.At present,to deal with the problem of imbalanced data classification,the main classification methods can be roughly summarized from two perspectives:one is from data resampling method in the dataset level and the other is from the better classification algorithm are designed in the algorithm level.The main idea of data resampling method is to change the proportion of samples in the original data through some mechanisms and get the balanced distribution of the data,which is used in the traditional machine learning classification algorithm.Classification results of this algorithm are closely related to the data distribution.Common strategies include oversampling and undersampling.At the algorithm level,it is better to deal with imbalanced data sets by designing classification algorithms.In addition,the combination of data set level and algorithm level has also been paid attention to a certain extent,namely the hybrid method.The main idea of the hybrid method is to sample unbalanced data sets at the data set level before processing them at the algorithm level.In this paper,the problem of imbalanced data sets classification is studied from the dataset level.Firstly,difficulties and its main existing problems in the imbalanced data classification are introduced.Then the existing classical imbalanced data classification algorithms are briefly introduced,and their main ideas,their advantages and disadvantages are analyzed.According to the aforementioned ideas and difficulties,this paper mainly studies from two aspects:1.This paper first proposes an imbalanced data oversampling method based on Constructive Covering Algorithm(CCA)neural network(CCA-SMOTE1 is based on the number of samples within the cover and CCA-SMOTE2 is based on the cover density).The main idea of CCA-SMOTE1 is to cover imbalanced data sets by CCA.For the cover of minority classes,the number of samples in the cover is formed by minority classes to mine minority samples.The main difference between CCA-SMOTE2 and CCA-SMOTE2 is that the cover density is used to select a few samples.The oversampling method excavates samples which can mine the minority samples easily misclassified from two different perspectives.This paper focuses on how to use the neural network of CCA to mine samples that are easily misclassified near the classification hyperplane ifn minority class.In CCA-SMOTE1 and CCA-SMOTE2,two strategies are provided based on CCA to obtain samples for SMOTE oversampling in minority.2.The above two strategies are binary of the key samples used for oversampling in mining minority class,which can not effectively eliminate the interference of noise samples.In view of the above problems,this paper proposes a CCA based three way decision imbalanced data oversampling ensemble model(CTDE),which combines three-way decision to mine samples for SMOTE oversampling in minority class.lIn this paper,a three-way decision ensemble over-sampling method(CTDE)based on CCA is proposed.The main idea of CTDE is to use CCA and three-way decision to mine samples which are easy to be misclassified in minority.Taking into account the uncertainties caused by CCA which randomly choosing cover center,the idea of ensemble learning is used to solve the randomness,and the final category is determined by voting.
Keywords/Search Tags:Imbalanced Data Classification, Constructive Covering Algorithm, Ensemble Learning, Three Way Decision, SMOTE
PDF Full Text Request
Related items