Font Size: a A A

Detecting Concept Drift And Classifying Data Streams

Posted on:2011-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:2178330332458773Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the information age, the development of communication, computer and network technology give us the chance to capture and store large amounts of data, which result in the data take an explosive growth. How to find patterns, trends and anomalies in these datas, and build a simple model from these datas, is one of the great challenges in the information age. Data mining teehnology is an important approaeh to address these challenges.In recently years there is a new kind of application appeared with the development of information technology include credit card fraud detection, network security monitoring, sensor data and power supply net. Such application model is characterized by data arrives continuous, orderly and in real time, scholars define such data form as data stream which is continuous and potentially infinite ordered sequence of data.Classification is an important task in the data mining domain and the characteristics of the data streams model has brought new challenges for traditional classification technology. How to use the fast arrival, overwhelming volume of the data and the concept drift as the training sample to build a model for effectively predict future data trends. This year there have been massive data stream classification algorithms, such as VFDT, CVFDT, the weighted ensemble classifier and online Bagging and Boosting.This paper focuses on the field of classification techniques in data stream with concept drift. First, we through analysis the concept drift then put forward a concept drift detection method. This method estimate the true error rate of the Up-to-Date concept to a certain model based on the statistical theory and detect the concept drift under a certain probability guarantee. Second, we apply the concept drift method and KMM algorithm to the Ensemble Framework of Classifier, and propose a new algorithm for data stream classification. The experimental results in the simulation and real data streams show that the algorithm is effective. Finally, we proposed a new algorithm for data streaming mining with concept drift called AHBag, which based Hoeffding tree and online Bagging ensemble. The algorithm tests data within an adaptive window using the statistical theory for capture the concept drift. According to the testing results to decided to update Hoeffding tree or rebuild a new Hoeffding trees. The experimental results show that the algorithm has a highly accuracy in dealing with data streams with concept drift.
Keywords/Search Tags:data stream, data stream mining, classification, concept drift, Hoeffding tree
PDF Full Text Request
Related items