Font Size: a A A

Research And Implementation Of Sequential Pattern Mining Algorithm Over Data Streams Based On Spark Streaming

Posted on:2019-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:L J DaiFull Text:PDF
GTID:2348330545955605Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the high integration of human,machine and object ternary world,the scale of global data is exploding,which implies important social and economic value.But stream data is the most real state of data generation,most of the previous mining work based on bulk static data.Therefore,the real-time,accurate and efficient mining of data in data stream is an urgent problem to be solved.Sequential pattern mining in sequential data streams is an important branch of stream data mining.Therefore,based on the traditional static sequential pattern mining algorithm research,this paper first makes a single stream improvement to the algorithm,and then Spark Streaming distributed streaming processing framework to improve and implement the algorithm.The main work of this paper is:(1)Based on the computing nodes of three Linux systems,a spark distributed cluster is built.In order to facilitate the development and testing of the algorithm,the Java,Scala locale and corresponding IDE are configured on the compute nodes.(2)By summarizing the development of sequential pattern mining algorithm and the challenge and requirement of data stream mining,this paper proposes a SPM-NDS(Sequential Patterns Mining-Non Distributed Streaming)algorithm of single data stream sequential pattern mining algorithm based on sliding window and compressTrie,and verifies the accuracy and stability of the algorithm through the UCI sequence data set.(3)Based on the analysis of Spark Streaming distributed flow processing framework,this paper proposes a distributed stream sequential pattern mining algorithm--SPM-SS(Sequential Patterns Mining-Spark Streaming)algorithm based on this framework,and verifies the efficiency of the algorithm through the UCI sequence data set.The contribution of this paper as follows:(1)The proposed SPM-NDS algorithm takes into account the accuracy and stability in the scene of streaming data,enriches the single stream sequential pattern mining algorithm,and lays a foundation for the subsequent algorithm distributed improvement.(2)The proposed SPM-SS algorithm makes full use of the advantages of distributed parallel computing,so that it can efficiently mine sequence patterns in sequential data streams,to meet the requirement of real-time and high efficiency of data stream mining,and enriches the algorithm set of Spark platform's streaming mining.So that it can be applied to the scene of sequential pattern mining for large-scale sequential data streams.
Keywords/Search Tags:Data Stream, Sequential Pattern Mining, Spark Streaming, Sliding Window
PDF Full Text Request
Related items