Font Size: a A A

Rule-based Combination Of Classifiers

Posted on:2011-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ShiFull Text:PDF
GTID:2208330332957798Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Classification, which has been applied in many fields such as scientific experiments and business forecasting, is one of the important issues in data mining. How to improve the accuracy of the classifiers is the key to the classification. Classifier ensembles usually demonstrate superior accuracy compared with that of the single classifiers. This paper studies the combination method based on rule classifiers thoroughly.Most existing ensembles construct each base classifier on a dataset sampled randomly from the training set. The method may lead to the loss of the information in the small dataset, and the accuracies of the base classifiers learned on the sampling datasets are low, that will affect the accuracy of combination. Therefore, the whole dataset, which can avoid the loss of information, is applied to construct the ensemble members to enhance the accuracy of the ensemble.Based on the ideas above, a new method, called PCARules, is presented for constructing ensembles of rule-based classifiers in this paper. Although the class label of a sample to be classified is also determined by taking weighted vote among the predictions made by each base classifier, our method is very different from Bagging and Boosting in the way of creating the training data for a base classifier.Instead of creating a training data for each base classifier by sampling, our method splits the feature set into K subsets randomly, upon each of which PCA is applied to find the corresponding principal components. And then all principal components are put together to form a new feature space, into which all original training data are mapped to create the training set for a base classifier. Experiments carried on 30 benchmark datasets selected randomly from the UCI Machine Learning Repository shows that our method not only improves performance of rule-based classifiers significantly, but also achieves higher accuracy in most of data sets than traditional combining methods such as Bagging and Boosting.Accuracy and diversity among the base classifiers in PCARules are studied in this paper. Observation on the experimental results of three randomly selected data sets, we find that the higher diversity among ensemble members can not guarantee the high accuracy of the combined classifier (AdaBoost). In contrast, the medium diversity and the strong complementation between each other lead to the best performing ensembles (PCARules). At the same time, the accuracy of the base classifier may greatly impact on the performance of the ensemble. For example, the accuracy of the base classifier in PCARules is significantly higher than that of Bagging and AdaBoost.
Keywords/Search Tags:classifier ensemble, feature extraction, RIPPER, principal component analysis, Kappa-error diagram
PDF Full Text Request
Related items