Font Size: a A A

Research On Stream Join Algorithm And Parallelization Based On Big Data Platform

Posted on:2018-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:X B LangFull Text:PDF
GTID:2348330536979913Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,Computing of Stream,as a very important research content,has token more and more attention in academic and business.In many scenarios of the real-time stream query,the requirement of the data processed is strict for timeliness.With the increasing of data volume,more and more stream computing system has been produced,such as STREAM of the Stanford University,Twitter Storm.However,the design of the stream query algorithm by the mentioned system is too simple,so it can not meet the demands of more complex real-time stream query business.As a representative stream processing algorithm in real-time stream query,stream join has important significance of research.With the development of big data and cloud computing,the implementation of stream join algorithm on big data platform is facing new problems and challenges,mainly embodied in the following three aspects:(1)how to design and implement the stream join algorithm in the big data platform;(2)how to improve the efficiency of the stream join algorithm;(3)how to achieve parallel join on big data platform.Through in-depth research and analysis to the above aspects,it takes adequate work for the framework of stream data processing and the platform of big data.First of all,the traditional stream join algorithm is improved,a stream join algorithm is proposed for window update with unstable data stream,where the optimal update cycle is periodically selected by calculating the join cost model.At the same time,for the parallel join of stream,a stream data distribution strategy based on consistent Hash is designed by combining the stream join semantics.The strategy analyzes the join semantics to generate the join plan firstly,and selects the optimal join scheme.The relationship distributes the stream data to the same node for calculation,and implements the parallel join on the big data platform.Because the stream data has the characteristics of infinite and continuous change,the time and space complexity of the join algorithm is greatly increased,which can not meet the practical application requirements.In the end,the parallel computing of Storm and join algorithm is designed.The experimental results show that the algorithm has better performance in terms of output and real-time performance.
Keywords/Search Tags:Big Data, Stream Query, Stream Join, Storm
PDF Full Text Request
Related items