Font Size: a A A

Research On Hybrid Ensemble Model Based Data Stream Classification With Unlabeled Data

Posted on:2018-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:J H HeFull Text:PDF
GTID:2348330542992634Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data stream classification is widely used in network monitoring,sensor networks and other practical fields.However,the problem of class missing,concept drifting and class imbalance in data stream greatly aggravates the difficulty of data stream classification.This dissertation studies the classification of the class missing,concept drifting and class imbalance data streams.The topic has important theoretical and practical value.The main work of this dissertation is as follows:(1)This dissertation summarizes the relevant definitions on data streams,concept drifts and skewed data streams,and introduces the challenges,processing approaches,and evaluation measures etc.(2)Existing incomplete labeled data stream classification approaches use the ensemble model to adapt to concept drifts while they ignore the concept drift detection.An ensemble classification method based on concept drift detection and model selection is hence proposed.Firstly,a hybrid ensemble model is built based on the classifiers and clusters.Secondly,a new concept drifting detection method is adopted based on the divergence of concept distributions between two adjoining data chunks to distinguish concept drifts.In the selection of the base model,the time-stamp based weight and the divergence between concept distributions is used simultaneously.Experimental results show that the proposed method can quickly adapt to concept drifting data streams,and improve the classification accuracy.(3)An ensemble classification method based on distance evaluation and sampling to solve the problem of incomplete labeled data stream classification with imbalanced class distribution.This method first calculates the distance between the unlabeled data and the center point of the labeled data chunks to mark the positive and negative instances.Secondly,the data chunk is reconstructed by over-sampling positive instances and under-sampling negative instances to balance the class distribution of the current data chunk which is used to build a hybrid ensemble model composed of the classification model and the clusters.Experimental results show that the proposed method can effectively improve the classification accuracy compared with the classical similar algorithm.
Keywords/Search Tags:Data Stream Classification, Ensemble Model, Concept Drift, Unlabeled Data, Imbalanced Class Distribution
PDF Full Text Request
Related items