Font Size: a A A

Research On Concept Drift Detection And Ensemble Classifier Based On Data Stream

Posted on:2018-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:C F WeiFull Text:PDF
GTID:2348330515984409Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Big data has changed a lot of the world,it influences many a ECCDDSts such as economy,technology,society and so on.However,there is a new big data model kown as data stream.Some huge value is hiden in the data stream.How to dig the value behind the data stream has became a hot topic in the reseach area of data mining.Acording to static data set,data stream arrives with high speed and time serial.It always change,and can not be stored in computer.Many algorithms for processing data stream exist several issuses: the difference between reconstructed data stream and original data stream is large;the concept drift detection must wait for classification;the ensemble classifier for data stream must use the real labels to detect concept drift and update the ensemble model.All these issues make it hardly to use these algorithms in real world.This paper implement a Hierarchical Amnesic Synopses based on sim Hash(SH-HAS)firstly.It gets the summary of data stream using sim Hash algorithm,updata the structure dynamically.SH-HAS compress the data stream effiently,and it can get a reconstructed data stream which is highly similar to original data stream.Then this paper modified FKNNModel(Fast KNNModel)concept drift detection algorithm,and proposed MFKNNModel(Modified Fast KNNModel)to detect the concept drift in data stream.MFKNNModel checked the distribution of sub series on data stream to detect the concept drift.Finaly,an ensemble classifier based on concept-drifting data stream(ECCDDS)is proposed.The integration for ECCDDS is horizontal integration.ECCDDS used the way of weighted voting to get the labels of data on stream.ECCDDS took the advantages of Spark streaming to construct ensemble classifier and improved efficiency.Experiments on benchmark datasets validate the effectiveness of proposedapproach proposed in this paper.
Keywords/Search Tags:Data stream, Synopsis structure, Concept drift, Ensemble classification, Parallel computing
PDF Full Text Request
Related items