Font Size: a A A

Research And Application Of Storm Scheduling Algorithm

Posted on:2019-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:H B DuanFull Text:PDF
GTID:2428330590465726Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the demand for real-time computing of big data is more urgent.Therefore,a distributed real-time streaming computing framework represented by Spark,S4,Storm,and Heron emerged.Among them,the Storm distributed real-time flow computing framework,with its high reliability,high fault tolerance,extensibility,language independence,simple programming model and so on,has been favored by many scientific research institutions and large Internet enterprises at home and abroad.It has become the mainstream real-time flow computing framework.Firstly,this thesis investigates and researches the related literature and source code of Storm computing framework,analyzes the operating mechanism of Storm framework,summarizes the problems existing in Storm scheduling algorithm and load balancing mechanism.And then this thesis proposes corresponding improvement strategies and validates it with experiments.The main work is as follows:The task scheduling of Storm depends on the user's parallelism configuration for topology tasks.If the configuration is not properly configured,it will cause problems such as increased delay of topology processing and reduced throughput.For this reason,a greedy scheduling algorithm based on best parallelism in Storm is proposed in this thesis.Firstly,the best parallelism of each component in Topology is solved,and then greedy strategy is used to schedule and ensure that the node CPU is not overloaded.Compared with the default scheduling algorithm,online scheduling algorithm and hot-edge scheduling algorithm,the greedy scheduling algorithm based on best parallelism can effectively reduce the Storm processing delay,improve system throughput and resource utilization.In the load balance of Storm,the default round robin scheduling policy and dynamic load balance strategy based on low priority of slot usage may cause load imbalance of node resources such as CPU,memory,and I/O reading and writing.For this reason,a dynamic load balance strategy based on node resources is proposed in this thesis.Firstly,the best parallelism of each component in Topology is solved,then the tasks are divided into computation-intensive and I/O-intensive,and the information entropy is used to represent the node and task load.Finally the load balance is based ontask scheduling.compared with the default scheduling strategy and dynamic load balance strategy based on low priority of slot usage,the dynamic load balance strategy based on node resources can effectively improve system throughput and balance cluster load.The research work of this thesis shows that based on Storm real-time parallel computing framework,research task scheduling strategy and further balance cluster load can reduce the processing delay of Storm,balance the cluster load of Storm,and further improve the performance of Storm.It is of great significance to the application and popularization of the Storm computing framework.
Keywords/Search Tags:real-time computing, storm, best parallelism, greedy strategy, load balance
PDF Full Text Request
Related items