Font Size: a A A

Study Of Ensemble Classification On Noisy Data Streams With Concept Drifts

Posted on:2012-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2178330335961602Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development and wide application of network communications, computer science and information technology, large amounts of data streams are produced in many applications fields, such as financial analysis, network monitoring, telecommunications data processing and sensor networks, which contain a lot of information to be mined urgently. However, it is a challenge for traditional algorithms and application system because of the characteristics of streaming data as being continuous, high-volume, rapid and open-ended. Especially, owing to the hidden concept drift in noisy data streams, the research is more difficult. The work of this dissertation focuses on the problems of classification on noisy data streams, and the main contribution is follows:(1) Related work on data streams research is reviewed, which include the definitions, applications, and the characteristic of models. Then the issue of noise in data streams with concept drifts are described in detail and given the research progress.(2) To tackle the concept drifts detection and noisy data in data streams, a classification algorithm CDSMM for mining data streams based on mixture ensemble models is proposed, which introduces hypothesis testing method to detect concept drifts, and adopts Na?ve Bayes classifier to filter noise from misclassification instances. Besides, it updates the model timely to adapt to concept drifts. Evaluations conducted on databases show that as compared with other ensemble methods based on single models, such as weighted-bagging, CDSMM presents better predictive accuracy and stronger anti-noise performance.(3) An ensemble algorithm CDDMI based cluster density is proposed to detect concept drifts in noisy data streams, in which clustering is introduced to classification tasks in data streams. The density fluctuation on misclassified instances is monitored to detect the concept drifts. Experimental studies present that compared with other algorithms CDDMI could detect the concept drifts timely and perform high predictive accuracies in noisy data streams.
Keywords/Search Tags:Data Streams, Classification, Concept drift, Ensemble Learning
PDF Full Text Request
Related items