Font Size: a A A

Research On Offline Task Scheduling Optimization Method Based On Storm Platform

Posted on:2020-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2428330590954694Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of related technologies such as Big Data and Artificial Intelligence(AI),the traditional big data processing method could not meet the demand.The development of big data stream computing has driven by the rapid growth of data.Meanwhile,there are many representative computing frameworks in different scenarios.As one of stream computing frameworks,Apache Storm can be satisfied with the characteristics of the stream computing environment.And its low latency,high throughput,high fault tolerance and other characteristics greatly widen the application scope of this platform.Considering that the Storm platform still has some optimization space in task scheduling.In order to avoid the risk of excessive communication overhead and unbalanced load which caused by the default round-robin task scheduling mechanism.To solve this problem,many experts and scholars,which from domestic and abroad have proposed many task scheduling optimization strategies.But most of them are online scheduling strategies during the topology operation stage,which have impact on the topology operation processing to a certain extent.This dissertation takes the stream computing framework-Apache Storm as the research objective,and then proposes the following offline task scheduling optimization method.An offline task scheduling strategy based on topology structure in Storm(TS~2-Storm)is proposed in this dissertation.First,on the basis of some basic models,such as the logic graph of the topology,the task assignment graph of the topology,the CPU resource constraint model and the optimal of communication overhead model are established.Second,the concept of components degree and the constraint principle of executors number are proposed.Then,combined with the characteristics of heterogeneous Storm clusters,the deployment of topology is divided into two processes:deployment of workers and deployment of executors in offline environment.According to the number of remaining CPU resource for each worker node,then sorted these nodes.Next,select the set of nodes which used to process the topology based on the user settings.The available slot is configured on these work nodes.In order to decrease communication overhead,allocate a worker on these nodes by round-robin scheduling strategy.And tasks closely related are assigned to the same worker node to reduce the inter-node communication overhead.By improving the load balancing state,the goal of reducing the communication overhead is achieved.Experimental results show that TS~2-Storm strategy has different optimized effects in system latency compared with the default task scheduling strategy and offline scheduling strategy.In addition,TS~2-Storm strategy,compared with the default scheduling strategy and offline scheduling strategy,also has a certain degree of improvement in terms of CPU resource occupancy,communication overhead between nodes,load balancing and throughput.
Keywords/Search Tags:Stream Computing, Apache Storm, Task Scheduling, Topology Structure, Throughput
PDF Full Text Request
Related items