Font Size: a A A

Storm-based Distributed Stream Data Association Rule Mining

Posted on:2020-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2428330620953997Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Stream data is widely used in the fields of sensors,network communication and the Internet.It is a set of unordered,real-time arriving,unbounded and continuous data items.Stream data is characterized by real-time,suddenness,infinity,disorder,and volatility.The characteristics of streaming data determine that all data cannot be stored completely in the database and needs to be mined in real time.Therefore,unlike traditional static data mining algorithms,stream data mining algorithms need to be improved for the characteristics of streaming data.Association rule mining is a kind of data mining algorithm.The purpose of this algorithm is to mine the intrinsic relationship between two itemsets.The existing stream data association rule mining algorithm FP-Stream runs on a single computer,and performance is limited by the configuration of the computer.Today,streaming data is growing at an increasing rate,and the performance of the FP-Stream algorithm is already stretched.Aiming at this problem,this thesis designs a distributed stream data association rule mining algorithm FP-Storm.In addition,this thesis designs a distributed flow data association rule mining framework based on Storm framework.Finally,in order to verify the availability of the proposed algorithm and framework,a stock recommendation prototype system based on stream data association rules mining is designed and implemented.This thesis mainly made the following work:(1)In order to solve the problem of low performance of existing stream data association rules algorithm,a distributed stream data association rule algorithm FP-Storm is designed.The algorithm uses a sliding window to select and cache data,and converts the stream data into batch data for processing.Then use the method of dividing projection to divide the batch data into different computing nodes for parallel mining.In each compute node,the historical batch data is stored in the prefix tree and the tilt time window,after which the prefix tree is traversed from the bottom up and the superset checks the frequent itemsets.Finally,the mining results of each computing node are summarized and output.The experimental results show that the algorithm has good accuracy and can effectively improve the mining efficiency of frequent itemsets.(2)For the existing stream data association rule mining algorithm in the implementation process https://manjaro.org/ multi-data source integration,mining process realization and mining results real-time rendering problems,designed distributed stream data association rules mining frame.First,the data integration module of the framework is implemented based on Kafka,and the transmission mechanism of Kafka is optimized by using the idea of fragment transmission.Then,based on Storm,the mining process of the distributed stream data association rule algorithm FP-Storm is implemented.Finally,the mining results are cached in real time in the Redis in-memory database.The framework can simplify the development process of stream data association rule mining,and facilitate programmers to deploy in other application systems.The experimental results show that the optimized Kafka data transmission speed and stability have a certain improvement,and improving the concurrency of the cluster can improve the operating efficiency of the framework to a certain extent.(3)In order to verify the practicability of the distributed distributed data association rule mining algorithm and framework,the stock recommendation prototype system was designed and implemented using technologies such as React Native and Spring.By analyzing the historical stock price fluctuation rules,the system updates the association rules between stocks in real time,and generates stock recommendation information to be sent to interested users.The system client interface is intuitive and friendly,and the degree of association between some stocks in the recommendation information is high.It verifies that the proposed algorithm and framework have certain practical value.
Keywords/Search Tags:Stream data, association rules, Storm, Kafka, frequent itemsets, distributed algorithm
PDF Full Text Request
Related items