Study And Implementation On Data Stream Online Classification Algorithm

Posted on:2010-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:X Luo

Full Text:PDF

GTID:2178360308478796

Subject:Computer application technology

Abstract/Summary:

Data stream mining has become an important part of data mining, comparing with the traditional data mining, the features of continuous, single scan, fast evolving, and potentially infinity of data stream challenge the mining work of stream data, which makes some of the traditional mining techniques don't meet the requirements. So, it's highly necessary to design single scan, real time, and fast algorithms.As an important part of data stream mining, classification of data stream also faces the same problems. As the data streams of applications in real world do not keep stable, the classifying model must be updated or rebuilt to adapt the changing of stream data distribution. The phenomenon of change in the underlying context of data stream takes the changes of target concept is referred to as concept drift.This thesis focuses on the detecting and adapting work of concept drift appearing on data streams. For one part, for the features of fast flowing in and changing of data stream, a classification algorithm based on clustering has been proposed in this thesis to improve the accuracy of classifying model. The training dataset is clustered into different clusters basing on their similarity and then classifiers are trained from thess clusters. The classifier has the most similarity with the coming record is assigned with the classifying work. Updating mechanism is used to maintain the total classification accuracy, and a heuristic learning method of training new classifying models from misclassified records is used to adapt the concept drift. For another part, for the situation of periodic recurrence of concept, a targeted algorithm has been proposed in this thesis to reduce the cost of updating classifying model and quicken the speed of classification. Considering that the concept number of data streams of the real world applications is limited and these concepts can recur periodically, in this thesis, an algorithm of making full use of the information of historic concept is proposed for this particular application, the classifying models of the same historic concept are used to classify the recurrent data stream concept to quicken the classification and reduce the running time of the hole classifying procedure.Experiments show that the classification algorithm based on clustering has a better performance on classifying accuracy and running time than the traditional classification, and the algorithm for concept drift of periodic recurrence improves the efficiency of classifying work with little loss of accuracy.

Keywords/Search Tags:

data stream, concept drift, online classification, cluster, periodic, historic concept

Related items

1	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
2	Research On Classification Algorithm Of Concept Drift Data Stream Based On Online Transfer Learning
3	Research On Online Ensemble Classification Algorithm Based On Concept Drift Detection
4	Research On Semi-supervised Classification Algorithm For Data Stream With Concept Drift
5	Research On Concept Drift Detection In Data Stream And Classification Algorithms For Imbalanced Data Stream
6	Research On Classification Algorithm For Conceptual Drift Data Flow
7	Research On Data Stream Classification Method Based On Concept Drift Detection
8	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift
9	Research On Classification Of Data Stream With Recurring Concept Drift
10	Detecting Concept Drift And Classifying Data Streams