Font Size: a A A

The Research Of Multiple Classifiers And The Application To The Imbalanced Data

Posted on:2016-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:S S LiFull Text:PDF
GTID:2308330464458431Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is a significant task in data mining. How to build an effective classifier according to the training set, and use this classifier to accurately predict the unknown instances are the major challenge in classification. Single classifier usually learns from the training set only once. Sometimes, it may extract less classification rules, and these rules have low quality. The accuracy may not be high. In addition, there exists lots of imbalanced data in the real world. When the data is not balanced, single classifier cannot effectively extract enough classification information from the rare class. So the instances in rare class may be easily misclassified. The multiple classifiers can learn from the training set more times. They can increase the number of classification rule and effectively improve the quality of the classification rules. Thus, the multiple classifiers can achieve higher accuracy than only use one classifier. At the same time, it is an effective way to deal with the imbalanced data classification. This paper presents three new multiple classification models. They change the way of how to learn from training set, and change the method of how to integrate the multi-classifiers. These methods can achieve high classification accuracy. Aiming at the characteristics of imbalanced data, one of the multiple classification models can effectively improve the classification accuracy of the imbalanced data.The main research work is as follows:First, we put forward a new method based on combination of multiple instances-covered-classifiers. By improving the decision tree algorithm as base classifier, this method can learn from the training set multiple times. It can generate a large number of classification rules. The instances in training set can be covered many times by classification rules which can increase the accuracy of the classification.Second, we put forward a new rule-based classification by multiple rule-inductions. Different from the traditional rule-based classification, this method builds a large scale of candidate set. It can generate large number of classification rules at each time. If one instance can be covered by at least two rules, the instance will be deleted. The progress will be continued.Finnally, we put forward one new imbalanced data classification based on combination of multi-classifiers. It is hard to extract enough classification rules for rare class in imbalanced data. The instances in rare class may be easily misclassified. The new method can generate multiple small balanced training sets. It can repeatedly learn from the rare class. To integrate the multiple classification results, it uses the evidence theory. Through many experiments under the measurements of F-measure态Gmean and AUC, we prove that this method can effectively improve the accuracy of imbalanced data.
Keywords/Search Tags:Data mining, Classification, Multiple classifiers, Imbalanced data
PDF Full Text Request
Related items