Font Size: a A A

Research On Data Stream Classification Method Based On Concept Drift Detection

Posted on:2018-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2348330515958161Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,data mining technology,as a cross product of machine learning,artificial intelligence,and statistics,has become a hot spot in the field of data research.In various fields,more and more data is in the form of stream,such as weather forecasting,web search,online shopping,credit card fraud detection,medical,financial analysis,bioanalysis,stock analysis,social networking,marketing and so on.How to obtain potential,effective and valuable information from these data streams has become an important research direction in the field of data stream mining.Data stream classification is an important research field of data stream mining.When the classification of data stream with dynamic,continuous and changeable characteristics is studied,the existing misclassification and concept drift problem have brought great trouble to the data stream classification,which requires the classifier to adjust quickly so as to better deal with the changes in the future data streams.The characteristics of the data stream itself to the traditional processing of static data classification technology has brought great challenges.It is imminent to put forward a new adaptive dynamic data stream classification algorithm.To solve the problem of data stream classification,this paper proposes two kinds of data stream classification methods as follows.(1)An ensemble classification algorithm based on particle swarm optimization(PSO)and online sequential extreme learning machine is proposed to deal with the fast arrival and instant ability of data stream.The proposed method selects an online sequential extreme learning machine as the base classifier,and then the individual classifiers are integrated according to different incentive functions.The weights of the base classifiers in the ensemble are optimized by PSO algorithm.The classification results are obtained by the voting model.This paper chooses four methods to carry on the experimental comparison,and selects several data sets from UCI to evaluate their performance.The experimental results show that the proposed method has high accuracy,G-mean and good flexibility.(2)A concept drift detection algorithm for data stream based on relative entropy is proposed for the problem of drift in the classification of massive,continuous and dynamic data streams.The processing of data stream is an incremental process of updating,in which the decision tree is used as the base classifier,and the accuracy of the classifier and the relative entropy of the leaf node are obtained respectively to judge whether the data stream is concept drift and real-time update classifier.In this paper,four methods are selected for comparison,and four synthetic data streams generated by MOA and a real data stream are used to evaluate.The experimental results show that the proposed method can not only detect the occurrence of concept drift effectively,but also improve the accuracy of classifier.
Keywords/Search Tags:Data Stream, Extreme Learning Machine, Particle Swarm Optimization, Decision Tree, Concept Drift
PDF Full Text Request
Related items