Font Size: a A A

A Real-Time Stream System Based On Batch-processing Schema

Posted on:2017-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y YaoFull Text:PDF
GTID:2348330503472485Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and popularity of all kinds of smart devices, human beings produce big data in an exploding way. How to efficiently deal with these big data in real-time has been a hot topic in both academia and industry. Due to this, there comes a variety of distributed stream computing frameworks. They can be roughly divided into two main categories: the continuous stream processing system represented by Storm?S4, etc.; the discretized stream processing system represented by Spark Streaming? Hop, etc. The discretized stream processing systems have high throughput and they are easy and quick in fault recovery. And also it's natural for them to combine with batch processing. Howerver when they face up to the stream applictions which require quick response and lots of small tasks in the system, the discretized stream systems can be inadaptable. This can delay the stream application and even make the system unstable.In the circumstance of the distributed clusters, a real-time stream system based on batch processing schema is designed to handle problems above. By monitoring the state of a job in the processing stage, it can evaluate the computing capacities of a node dynamicly. And the system can predict the speed of input data streams in the futue by analyzing the historical statistics. Also, it designs and realizes a proactive mechanism for load balancing. The system uses the micro batch as a data unit and distributes the capacity-adapted data to the specific work nodes when receiving input data streams. So when in the processing stage, the work nodes which own the data can lauch tasks locally and achieve lower processing delay as well as better load balancing. Moreover, the system distributes input data in the receiving stage which attains higher throughput.The experiment results indicate that a real-time stream processing system improves a lot in processing delay and throughput compared to tranditional didscretized stream processing systems. The improvements can be 50% and 200%. And with the rise of stream applications' complexity, these imporvements get bigger.
Keywords/Search Tags:Big Data, Real-Time Stream Processing Systems, Load Balancing, Data Locality
PDF Full Text Request
Related items