Font Size: a A A

Design And Implementation Of A High Scalable And Fault-tolerant Stream Processing System

Posted on:2016-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:F Z ZhengFull Text:PDF
GTID:2308330470967714Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, the amount of data in our world has been exploding. And quite a part of these data are generated in real time with characteristics of infinite, disorder, burstiness and volatility. Usually, their value decreases over time. So how to analyze them in a robust, reasonable, efficient and real-time distributed environment becomes a problem worth carefully studying.Stream processing systems are aimed to solve this problem. However, there exists some improvement can be made in current stream processing systems. This paper designs and implements a high scalable and fault-tolerant stream processing system named DStream based on design experience of current systems.Firstly, the system architecture adopts master-slave and stateless design. The communication between master and slaves mainly takes advantage of distributed coordination service. And a well-designed task model is provided for developers. Above all of these build a solid foundation for scalability and fault- tolerance.Secondly, scalability is designed from bottom to up, including physical nodes, processing logic, task configurations and task running. Dynamical addition and deletion of nodes is supported to adjust computing resource. Updates of processing logic and task configurations can be synchronized into the system online. The system can deal with burst data well to avoid system crash and process them efficiently.Thirdly, fault-tolerance is provided for physical nodes, processing elements and running data. Every functional modules can deal with the nodes’crash and the fail of processing elements will be detected then be restarted or rescheduled. The system can process data in at-most-once, at-least-once and exactly-once modes, and provides a way to make trade-offs between the proper fault-tolerance level and efficiency.At last, this paper describes an application and some experiments based on DStream to show the usage scenario and verify the design.
Keywords/Search Tags:Big Data, Stream Processing System, Fault-tolerant, Scalable
PDF Full Text Request
Related items