The Research Of Multiple Classifiers And The Application To The Imbalanced Data

Posted on:2016-08-22

Degree:Master

Type:Thesis

Country:China

Candidate:S S Li

Full Text:PDF

GTID:2308330464458431

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Classification is a significant task in data mining. How to build an effective classifier according to the training set, and use this classifier to accurately predict the unknown instances are the major challenge in classification. Single classifier usually learns from the training set only once. Sometimes, it may extract less classification rules, and these rules have low quality. The accuracy may not be high. In addition, there exists lots of imbalanced data in the real world. When the data is not balanced, single classifier cannot effectively extract enough classification information from the rare class. So the instances in rare class may be easily misclassified. The multiple classifiers can learn from the training set more times. They can increase the number of classification rule and effectively improve the quality of the classification rules. Thus, the multiple classifiers can achieve higher accuracy than only use one classifier. At the same time, it is an effective way to deal with the imbalanced data classification. This paper presents three new multiple classification models. They change the way of how to learn from training set, and change the method of how to integrate the multi-classifiers. These methods can achieve high classification accuracy. Aiming at the characteristics of imbalanced data, one of the multiple classification models can effectively improve the classification accuracy of the imbalanced data.The main research work is as follows:First, we put forward a new method based on combination of multiple instances-covered-classifiers. By improving the decision tree algorithm as base classifier, this method can learn from the training set multiple times. It can generate a large number of classification rules. The instances in training set can be covered many times by classification rules which can increase the accuracy of the classification.Second, we put forward a new rule-based classification by multiple rule-inductions. Different from the traditional rule-based classification, this method builds a large scale of candidate set. It can generate large number of classification rules at each time. If one instance can be covered by at least two rules, the instance will be deleted. The progress will be continued.Finnally, we put forward one new imbalanced data classification based on combination of multi-classifiers. It is hard to extract enough classification rules for rare class in imbalanced data. The instances in rare class may be easily misclassified. The new method can generate multiple small balanced training sets. It can repeatedly learn from the rare class. To integrate the multiple classification results, it uses the evidence theory. Through many experiments under the measurements of F-measure、Gmean and AUC, we prove that this method can effectively improve the accuracy of imbalanced data.

Keywords/Search Tags:

Data mining, Classification, Multiple classifiers, Imbalanced data

PDF Full Text Request

Related items

1	Research And Application On Data Mining Classification Arithmetic Based On Multiple Classifiers Fusion
2	The Research Of Imbalanced Data Based On Oversampling Technique
3	The Research Of Imbalanced Data Classification
4	The Algorithm Research Of Associative Classification And Classification Based On Imbalanced Data
5	Research On The Classification Algorithm Of Imbalanced Data Sets
6	Selectively Mining Approach With Dynamical Chunk Size For Imbalanced Data Streams In Nonstationary Environment
7	Research On Classification Algorithm For Imbalanced Data Sets Based On Support Vector Machines
8	Research On Incomplete Data Classicifation Based On Multiple Classifiers
9	Research On Classification Algorithms Of Data Mining Based On Imbalanced Data Sets
10	The Classification Algorithm Research Based On Imbalanced Data