Font Size: a A A

Design And Implementation Of Real-time Streaming Module Based On Spark Streaming

Posted on:2017-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:W C XuFull Text:PDF
GTID:2348330566956744Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays,as the internet technology and mobile devices developing rapidly,a growing number of data produced through the internet.Famous companies like Tencent,Baidu and Alibaba will produce data of TB levers every day.Traditional ways to deal with such a large number of data is based on offline Hadoop technology.Hadoop enable user to get the insight of data within hours.However,with the fast growing business,the request for real-time analysis comes into eyes.Real-time streaming data analysis can deal with the requirement.But it will become instability when facing burst data or different workload.What's more,in conditions such as transforming transaction data,the demand for reliable transmission is needed.On account of the problems above,the article proposed a reliable and stable design based on Flume and Spark Streaming.The article is mainly based on two aspects below:1.For the reliable aspect,based on Flume source,we add the data checking function through modifying the Sink end and Receiver end.The modified moduler will check the numbers of data and then check the data using checkcode.2.On the stability side,we analysis the workload characteristics of map,reduce and join workload,and then we find the relation between processing time and batch interval.Map and reduce workload is linear relation while join workload is non-linear.We come up with an adaptive batch interval algorithm based on gradient which enables the real-time data processing system work stable and reliable in condition of unknown input data rate and workload.Thus it can achieve the goal of keeping the system work stably.Through the design and implementation,the reliability and stability of the streaming system improved and the end-to-end latency decreased in face of different workload and burst input data.The design proved to have a big advantage over traditional ways of stop the streaming service and adjust the batch interval manually and it improved the auto maintenance ability prominently of streaming data processing system.
Keywords/Search Tags:bigdata, spark, stream processing, dynamic self-adaptive, stable and reliable
PDF Full Text Request
Related items