Font Size: a A A

Research On Online Ensemble Classification Algorithm Based On Concept Drift Detection

Posted on:2018-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y B KangFull Text:PDF
GTID:2348330542960023Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data stream is the product of high-speed development of information technology.Compared with the traditional data,real-world data stream has some problems like high dimension,noise,false mark,concept drift and unbalanced data distribution,and require high space-time performance and accuracy.Existing models are difficult to overcome these problems,so it can't solve the data stream effectively.With the emergence of a large number of data streams in more and more applications,the research on data stream mining has become one of the hot spots in the field of data mining.This paper study the characteristics of data stream,analyze the background of data stream mining and related technologies,and then focus on the problem of concept drift in data stream classification.The main innovation work is as follows:Firstly,an online updating ensemble model based on concept drift detection is proposed,which is called DDOE(Drift-Detection Based Online Ensemble).This algorithm apply Hoeffding Adaptive Tree as a base classifier,which trains an alternative subtree on each node.When the new data block arrives,the algorithm first uses the extended DDM algorithm to detect the data block.If a concept drift is detected at a sample,the data block is disconnected by that sample.First using the instances before the concept drift to train a latest model,and replace the worst base classifier in the ensemble framework.And then the instances after the concept drift is used to adjust the base classifier,which can make the original model more adapt to the new concept.In addition,the instances after the concept drift is added to the next data block.If there is no concept drift,this algorithm just update the weight of the existing classifier,wouldn't build a new model,which can effectively reduce the use of time.Finally,in order to accommodate the gradual concept drift,the base classifiers would be trained on the latest data blocks and be updated online in the case of no concept drift.Secondly,based on the fact that an instance which is misclassified may imply the trend of new concept,an algorithm called EWOE(Examples-Weighting Based Online Ensemble)which use the instance weighting mechanism is proposed.This algorithm takes into account the fact that when the base classifier is updated with the latest data block,examples which misclassification may come from the new concept,and instances of correctly categorized are old concepts,which should be treated differently.The algorithm assign larger weight to instances of misclassified,increasing their impact when updating the base classifiers,which can enable the algorithm discover and adapt to the new concept more quickly.Finally,in order to verify the validity of the proposed method,the algorithm is compared with other methods on the artificial and real data sets.Experiments show that the two methods can achieve high classification accuracy in the case of low noise content,and have certain advantages compared with other algorithms.
Keywords/Search Tags:Data Stream Classification, Concept Drift Detection, Integrated Classification, Online Update, Instance Weighting
PDF Full Text Request
Related items