Font Size: a A A

Research And Application Of Imbalanced Data Classification Algorithms Based On Ensemble Learning

Posted on:2015-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhouFull Text:PDF
GTID:2298330467986699Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, imbalanced data classification has become one of the important research issues in data mining. Imbalanced data refers to the various classes of data sets disparities in the number of samples in which the class that has less samples is called minority class, the class that has more samples is called majority class. The traditional classification algorithms when dealing with balance data classification problems can often play an effective performance advantage, however, when dealing with imbalanced data classification, correct classification rate of minority class samples are often less than the classification accuracy of the majority class samples. In many practical applications, such as fraud diagnostics, fault detection, text detection, spam filtering, etc., are usually more concerned about the accuracy of the minority class sample classification. Therefore, the study of how to improve the classification performance of imbalanced data and the generalization ability of classifier has the important value and practical significance.In order to improve the performance of imbalanced data classification, people has made a lot of improvement on the basis of the traditional classification algorithms. These improvements are mainly concentrated in two aspects:data plane and algorithms. The improvement of data level is mainly that resamples the samples of dataset, such as random oversampling, SMOTE algorithm, and one-side sampling algorithm. Datasets become basic balance between different classes by changing its distribution characteristics, and then it classed by the traditional classification algorithms. Algorithm level, while maintaining the distribution of the original data, modify the internal structure of the algorithm so that it can adapt to unbalanced data classification, such as the classification algorithm based on cost-sensitive, improved SVM, ensemble learning algorithm.In the improved algorithm, the ensemble learning can exhibit good classification performance and strong generalization ability. It can further improve the classification performance integrated learning by improving the single classifier and coordinating the diversity between the base classifiers. Based on the above consideration, this paper has carried on the following several aspects work:first, in the data level, analyzes and summarizes the one-side sampling and SMOTE algorithm, aiming at the existing problem of algorithm, propose an improved SMOTE algorithm; second, at the algorithm level, research and analyze the advantages of integrated learning to solve classification problems and the factors that affect its classification performance, propose a new ensemble framework, named as2D-SEFrame, for the problem of imbalanced data classification; third, studied the common strategy of multi-class classification and extended2D-SEFrame to the problem of multi-class imbalanced data classification, propose an algorithm, named as MC2D-SEFrame; forth, applied the MC2D-SEFrame to the actual data of ECG classification problem, Experimental results show that the proposed algorithm can achieve good classification results.
Keywords/Search Tags:Data Mining, Imbalanced Data, SMOTE, Ensemble Learning, 2D-SEFrame, MC2D-SEFrame
PDF Full Text Request
Related items