Research On Classification Of Data Stream With Recurring Concept Drift

Posted on:2017-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:C Feng

Full Text:PDF

GTID:2348330509455401

Subject:Computer Science and Technology

Abstract/Summary:

In the era of big data, data generated as stream is very common, such as data generated by sensors, browsing and purchasing records generated by shopping website users, constantly changing social networks generated by social website users, and so on. Concept drift often occurs in data stream which makes the traditional classification methods not applicable to data stream. Concept drift constitutes a challenging problem for data stream mining, recurring concept drift is one of the sub-types of concept drift. Because of the high speed and large volume of data stream, it’s hardly possible to obtain label for every instance in real-world applications which makes many instances unlabeled.For the issues of recurring concept drift and missing labels that frequently appear in data stream classification, the following contributions have been made:(1) In the case of recurring concept drift detection, it is very important to represent concepts and select the most appropriate classifier to classify. To deal with these issues,an algorithm for classifying text data streams with recurring concept drifts has been proposed. It can recognize recurring concepts by computing the differences of main features and impact factors of different batches of instances. And it maintains a classifier for each concept and monitors the classification accuracy to select classifier according to hoeffding inequality in order to enhance the ability of adapting to concept drift. The experimental results illustrate that the algorithm proposed achieves better classification accuracy, adapts faster to concept drift, and detects concept drift more accurately than the other four algorithms on the data stream with recurring concept drift, and it’s also apt to classify data stream without recurring concept drift.(2) A classification algorithm for partially labeled data stream with recurring concept drift has been proposed. The algorithm detects recurring concept drift by monitoring classification accuracy. The detection threshold adjusts automatically according to the classifier’s generalization performance, which can reduce the risk of making wrong judgments and avoid setting threshold manually. The algorithm labels the unlabeled data by semi-supervised classification method,which can increase the number of labeled data and thus be able to improve the generalization performance of classifiers. To improve the labeling accuracy, the concept-specific classifiers of historical concepts are introduced in to assist semi-supervised classification. The experimental results illustrate that the algorithm can accurately determine whether two concepts are the same and thus be able to make use of recurring concept to improve the responding speed to concept drift, and as a consequence, can minimize significantly the negative impact on classification accuracy caused by concept drift. It can also be seen from the experimental results that making use of historical classifiers to assist semi-supervised classification can improve labeling accuracy significantly and as a result can greatly improve the generalization performance of classifiers.

Keywords/Search Tags:

data stream, concept drift, semi-supervised classification, recurring concept drift

Related items

1	Research On Semi-supervised Classification Algorithm For Data Stream With Concept Drift
2	Research On Semi-supervised Data Stream Classification Method Based On Ensemble Model
3	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
4	Research On Classification Algorithm Of Concept Drift Data Stream Based On Online Transfer Learning
5	Research On Classification For Data Streams With Concept Drift
6	Research On Classification Algorithms For Data Streams With Concept Drift
7	Classification Algorithm For Data Streams With Concept Drift And Its Applications
8	Research On Data Stream Classification Method Based On Concept Drift Detection
9	Research On Concept Drift Detection In Data Stream And Classification Algorithms For Imbalanced Data Stream
10	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift