Research On Dynamic Data Stream Classification Algorithm

Posted on:2014-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:F Chen

Full Text:PDF

GTID:2248330398950259

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the development of information technology, the traditional data mining is facing unprecedented challenges:Mining target are changing, from static data in the database, to the real-time dynamic data stream mining. The data stream have the characteristics of massive (that means data can’t be all saved), real-time process and instability (there are concept drifts in data stream). Now, the data stream mining research hot spots including credit card fraud detection, network security monitoring, sensor data monitoring and power grid.In the environment of dynamic data stream, the traditional classification method is difficult to adapt to high-speed, high-performance requirements. In the same time, due to concept drift, knowledge implied in the data may change over time. This requires the classification model dynamic updates with data changes. In the facing of concept drift, traditional classification methods often fail and not suitable for dynamic data stream classification. Therefore we need to propose a new classification method.For concept drift, inspired by the KL divergence method of concept drift method. This paper discuss a method use the KL divergence for conceptual similarity, with KDQ tree divided the data set and Bootstrap determine similarity threshold.For dynamics of data stream, based on the method of concept similarity, this paper proposed a new data stream semi-supervised classification model. In this model, by dividing the data stream into sub dataset, when new data coming, based concept similarity method to select the appropriate classifier for classification. Artificial datasets and real datasets are used to evaluate the performance of the model. The experiments show that the proposed model can deal with both the dramatic concept drift and slow drift, and has a good ability of self-adapting.For massive of data streams, this paper proposed a high parallelism algorithm based on the MapReduce framework for dynamic data stream classification, the proposed algorithm based on the extreme support vector machine incremental learning method, tracking real-time data stream concept drift, by construct a weight matrix to fix the model residuals, by using forgetting factor to enhance the role of the new sample. Experiments show that the method has a good parallel performance while handling of dynamic data stream concept drift efficiently.

Keywords/Search Tags:

Data stream classification, Concept drift, Concept similarity, Timeforgetting robust extreme support vector machine

PDF Full Text Request

Related items

1	Research On Data Stream Classification Method Based On Concept Drift Detection
2	Research And Implementation Of Uncertain Data Streams Classification Technique Based On Distributed Extreme Learning Machine
3	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
4	Research On The Classification Of Data Stream With Concept Drift Based On Cosine Similarity
5	Research On The Classification Methods For Dynamic Data Stream
6	Research On Concept Drift Detection In Data Stream And Classification Algorithms For Imbalanced Data Stream
7	Research On Classification Algorithm For Conceptual Drift Data Flow
8	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift
9	Research On Classification Of Data Stream With Recurring Concept Drift
10	Detecting Concept Drift And Classifying Data Streams