Research On Hybrid Ensemble Model Based Data Stream Classification With Unlabeled Data

Posted on:2018-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:J H He

Full Text:PDF

GTID:2348330542992634

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data stream classification is widely used in network monitoring,sensor networks and other practical fields.However,the problem of class missing,concept drifting and class imbalance in data stream greatly aggravates the difficulty of data stream classification.This dissertation studies the classification of the class missing,concept drifting and class imbalance data streams.The topic has important theoretical and practical value.The main work of this dissertation is as follows:(1)This dissertation summarizes the relevant definitions on data streams,concept drifts and skewed data streams,and introduces the challenges,processing approaches,and evaluation measures etc.(2)Existing incomplete labeled data stream classification approaches use the ensemble model to adapt to concept drifts while they ignore the concept drift detection.An ensemble classification method based on concept drift detection and model selection is hence proposed.Firstly,a hybrid ensemble model is built based on the classifiers and clusters.Secondly,a new concept drifting detection method is adopted based on the divergence of concept distributions between two adjoining data chunks to distinguish concept drifts.In the selection of the base model,the time-stamp based weight and the divergence between concept distributions is used simultaneously.Experimental results show that the proposed method can quickly adapt to concept drifting data streams,and improve the classification accuracy.(3)An ensemble classification method based on distance evaluation and sampling to solve the problem of incomplete labeled data stream classification with imbalanced class distribution.This method first calculates the distance between the unlabeled data and the center point of the labeled data chunks to mark the positive and negative instances.Secondly,the data chunk is reconstructed by over-sampling positive instances and under-sampling negative instances to balance the class distribution of the current data chunk which is used to build a hybrid ensemble model composed of the classification model and the clusters.Experimental results show that the proposed method can effectively improve the classification accuracy compared with the classical similar algorithm.

Keywords/Search Tags:

Data Stream Classification, Ensemble Model, Concept Drift, Unlabeled Data, Imbalanced Class Distribution

PDF Full Text Request

Related items

1	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
2	Research On Concept Drift Detection In Data Stream And Classification Algorithms For Imbalanced Data Stream
3	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift
4	Research On Data Stream Classification Algorithm Based On Ensemble Learning
5	Research On Classification Algorithms Of Concept Drift And Imbalanced Data Streams
6	Research On Concept Drift Detection And Ensemble Classifier Based On Data Stream
7	Research On Classification Algorithm For Conceptual Drift Data Flow
8	Research On Concept Drift Data Stream Classification Based On Ensemble Learning
9	Research And Implementation Of Classification Algorithm For Positive And Unlabeled Examples Learning On Uncertain Data Stream
10	Research On Semi-supervised Classification Algorithm For Data Stream With Concept Drift