Font Size: a A A

Research Of Self-Adaptive Platform For Deployment And Configuration Of The Storm Tasks

Posted on:2017-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y DengFull Text:PDF
GTID:2348330488958156Subject:Computing applications technology
Abstract/Summary:PDF Full Text Request
In the Big Data era, we stride into a data-driven economic society. The ability to timely and effectively analyze massive data has become an important prerequisite for the success of an industry. Many modern data processing environments require processing complex computation on streaming data in real-time. Storm, an effective tool for processing streaming data in real-time, has caused widespread concern in industry and academia. The optimization of the processing performance of Storm has become a hot research topic. There have been many research works to improve the efficiency of Storm data processing by optimizing the Storm task scheduling. But most of them ignore the direct impact of the task configuration parameters on the performance of Storm processing. In fact, if the task configuration parameters have not been properly set, the performance and stability of the Storm cluster will be severely affected. Pervious optimization scheme is also not able to achieve very good optimization results.On the basis of studying the Storm and related technologies of streaming processing framework, this paper proposes a Storm task deployment and configuration platform, which is dedicated to solve the problem of performance optimization of Storm cluster. In this platform, this paper mainly implements the following functions:(1) This paper design and implement a cluster aware module for monitoring the change of cluster resources, obtaining the historical distribution information and monitoring the communication among the nodes in the cluster;(2) In order to solve the adverse effects on the processing performance caused by setting the number of task running processes, this paper designed and implemented a Storm job configuration self-adjusting module;(3) On the basis of (1) and (2), this paper designed and implemented a Storm scheduling algorithm based on the platform for the purpose of improving the processing performance.Experiments have proved that after combined with the perception of cluster global state and reasonable task self-adaptive configuration parameter, our scheduling algorithm can greatly reduce the amount of communication of the internal cluster tasks and improve the storm cluster processing performance, thus providing a more agile and efficient solution for the data processing of massive flow analysis. Experiments show the effectiveness of the approach as the latency of processing an event is about below 47.6% with respect to the default scheduler of Storm, is about below 21.4% with respect to the current best algorithm OnlineScheduler that based on the internal traffic.In this paper, we first introduce introduces the importance of real-time streaming data processing in the condition of Big Data and the development of stream computing framework. The implementation of the platform and related technology are introduced in brief then. After that, we systematically present the overall structure of the platform and the implementation of the core modules. Then, the performance test results of the platform are analyzed and introduced. At the end of the paper, we summarize this study and show future directions.
Keywords/Search Tags:Stream Computing, Storm, Task Configuration, Scheduling Algorithm
PDF Full Text Request
Related items