Font Size: a A A

Research On Data Stream Classification Method Based On Storm

Posted on:2020-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:S DongFull Text:PDF
GTID:2428330575987997Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the high speed development of the information age.Data stream classification has gained increasing significance from researchers in recent years because of its wide emerging practical applications,for instance,health monitoring,financial transactions,and traffic management.Compared to the data,which is from traditional static environment,streaming data requires strict requirements for high speed and accuracy of data analysis as a new data model.In order to analyze and process stream data,we need to be able to record real-time data stream information quickly and ensure accurate information in time.However,the current classification of stream data is faced with two problems:(1)The limited number of labeled data instances can not meet the current supervised learning classification algorithm for reasonable and effective concept drift detection;(2)Streaming data is large and fast,and how to classify convective data quickly and accurately is a major problem at present.In this paper,the above two problems are studied and analyzed,and the advantages and disadvantages of current research on conceptual drift and classification of convective data at home and abroad are summarized.An integrated classifier algorithm based on active learning is proposed.Active learning can effectively conduct unsupervised learning of conceptual drift of convective data,greatly reducing the number of labeled data instances.At the same time,the integrated classifier can effectively integrate a single classifier,and distribute weights through the classification performance of different classifiers,so that the predicted value of the final output can be more accurate to improve the classification accuracy of stream data.Finally,in order to improve the classification speed of streaming data,we adopt Storm distributed computing platform,which is the most popular platform for streaming data processing.We deploy the proposed method on this platform,which makes the proposed algorithm achieve parallelization and the classification speed of streaming data significantly improved.Finally,we test the parallel implementation of ensemble classifier based on active learning.The experimental results show that the proposed method can effectively avoid the shortcomings of limited number of labeled data instances in stream data,low classification accuracy and low classification rate.
Keywords/Search Tags:data stream, concept drift, classification algorithm, active learning, classifier ensemble, storm
PDF Full Text Request
Related items