Font Size: a A A

Three-way Decision Concept Drift Detection Based On Decision Tree

Posted on:2018-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:B H SuFull Text:PDF
GTID:2348330569986429Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,there are many applications in data stream form with the rapid development of mobile Internet,networking and other technologies.Various forms of network applications enter our daily life,such as mobile payments,sensor networks,and so on.These applications deal with large amounts of data every day,and provide real-time network services,such as data query and personalized recommendation,intelligent services,etc.Many application data are generated in the form of data streams.Because of the character that massive,continuous and rapid generation,dynamic changing with time,data streams must be processed in real time.The traditional data mining algorithms are facing enormous challenges,and algorithms fit the characteristics of data stream and its applications are needed.At the same time,data streams always contain concept drift that the concept of data stream will change with time and hidden context.However,traditional concept drift detection methods generally divide concept into two class that drift and no drift according to detection indictors.These methods are susceptible to noise and other uncertain factors.Usually some real concept drifts are not detected and some indicator changes caused by uncertain factors are detected as concept drifts,resulting low detection accuracy.To solve this problem,this article proposes a three-way decision concept drift detection algorithm,improving the detection accuracy.1.In this article,a three-way decision concept drift detection algorithm is proposed.The decision tree is used to learn the concept of the data stream,and then concept drfit is detected.Each subtree in the decision tree corresponds to one concept.In the detection process,classify training examples with the decision tree,then according to classification error rates,the subtrees is divided into L domain,R domain and M domain,the three domain of three-way decision.2.This article presents a method for further partition of M domain.In the data stream,there are many uncertain factors.And the data stream contains a lot of noisy data and abnormal data.How to distinguish concept drift and uncertain factors such as noise is the key of concept drift detection.In this article,considering the difference of the amplitude and duration between the influence of concept drift and uncertain factors on data distribution,a new concept partition method based on cumulative sum is proposed,making further patition of the concepts in M domain.3.The MOA data strean analysis platform is used for comparative experiment.By comparing the least classification accuracy and classification accuracy changes after concept drift occurred on artificial datasets,and comparing the average classification accuracy on real datasets,verified that the algorithm can detect concept drift earlier,improving the performance of classifier.
Keywords/Search Tags:concept drift detection, three-way decision, data stream, decision tree, uncertain factors
PDF Full Text Request
Related items