Font Size: a A A

Research On Data Stream Classification Algorithm Based On Ensemble Learning

Posted on:2019-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:J HanFull Text:PDF
GTID:2428330548451853Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,data stream is continuously produced in various application fields,such as online shopping transaction records,traffic flow monitoring data and satellite detecting data.The data stream has the characteristics of real time,continuity,variability,and infinity,which make the traditional data mining method is difficult to ensure good applicability.Therefore,it has become one of the hotspots and difficult points in the field of data mining that how to exploit the data efficiently for knowledge mining.Ensemble classification is an effective method of data stream classification.The main idea of this method is to establish multiple base classifiers and integrate them,then evaluate them based on the performance of base classifiers and eliminate the base classifier with poor performance to update the ensemble model,thereby improving the performance of the ensemble classification model.Based on this,this paper mainly studies the ensemble classification algorithm for the data stream classification with concept drift and noise.The research results are listed as follows: First of all,the background and significance,related work and basic theory of data stream classification are introduced,then the key problems and key technologies of data stream classification are analyzed in detail,and the ensemble learning theory used in this paper is theoretically studied.Secondly,based on the concept of selective ensemble,a selective ensemble classification algorithm of data stream based on Margin Distance Minimization is proposed.The algorithm uses the selective ensemble method of Margin Distance Minimization to select the classifiers with better accuracy and diversity to the ensemble classification model.The experimental results show that the algorithm has better classification performance for concept drift data stream.Finally,an ensemble classification algorithm combined with Naive Byes and unsupervised learning is proposed for the problems of concept drift and noise data on data stream.The algorithm uses Naive Bayesian as the basic classifier for ensemble classification and then the Spectral Clustering algorithm is applied to clustering the data,the results of classification and clustering are compared to filter the noise data.Meanwhile,the ?-hypothesis testing method is used to detect the concept drift and dynamically updates the ensemble classification model to accommodate the conceptual changes.The experimental results show that the proposed algorithm can achieve better results in time cost and prediction accuracy.
Keywords/Search Tags:Data Stream, Ensemble Classification, Concept Drift, Margin Distance Minimization, Unsupervised Learning
PDF Full Text Request
Related items