Font Size: a A A

Data Stream Classification Algorithm Based On The Ep

Posted on:2008-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:C C ChenFull Text:PDF
GTID:2208360215460487Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Huge volumes of data streams are generated at unprecedented rates in the range of applications including credit card fraud protection, target marketing, network intrusion detection, sensor network, etc. The data streams are continuous, ordered, changing, fast and huge amount, meanwhile, its data distribution is likely to be changed, namely concept drifts will happen. How to train the classification model to predict the coming data trend effectively is just about one difficulty in the research of data stream classification and is also an important task.Classification is an important task in the data mining domain and there are comprehensive applications such as credit card fraud protection, target marketing, network intrusion detection, etc. There exist some classical classification methods including Decision Tree, Bayesian network, Neural Network and SVM, etc, whereas, they are facing new challenges such as the overwhelming volume of the data streams and the concept drifts when processing data streams. During these years several stream data classification algorithms are proposed by some researchers, such as VFDT&CVFDT, VFDTc and Ensemble classifiers, etc. We can usually improve the classification accuracy by integrating several classifiers, especially when having some diversity between each two base classifiers. Ensemble classification method proposed by Wang et al is based on C4.5, RIPPER, Naive Bayesian, while using other algorithms as base classifiers is still required to study. As we know, eEP (essential emerging patterns, eEP) have the favorable distinctive function and EP-based algorithm performs comparably with other types of classification algorithms. Meanwhile, eEP-based algorithm has been applied in many domains successfully, such as DNA analysis and text automatic categorization etc.Considering the above-mentioned factors, this paper proposed an algorithm, called Classification by eEP-based Classifiers Ensemble (CEEPCE), to classify data streams. The main research contents of the paper are illustrated as follows. Firstly, based on summarizing the characteristics of data stream and analyzing eEP-based algorithm, we combined the concepts of basic windows and sliding windows with the eEP-based classification algorithm to make our algorithm appropriate for characteristics of data stream and solve the problem of concept drifts. Secondly, we proposed the idea of weighting the classifiers ensemble in the process of the constructing different classifiers. Finally, in the process of classify the coming test examples, we paid much more attention to the latest data blocks, therefore each base classifier was weighted based on its arriving time, introduced a new weighting strategy named by "the weighting method based on classification accuracy" to weight different base classifiers and ensemble all selected classifiers to raise classification accuracy.Our experiments show that CEEPCE algorithm proposed in the paper can preferably solve concept drifting in data streams and owns a higher accuracy of classification than the single classifier and performs comparably with the ensemble method based on C4.5.
Keywords/Search Tags:Data stream mining, classification, data stream, Emerging Patterns (EP)
PDF Full Text Request
Related items