Font Size: a A A

Research On Task Scheduling Strategy Based On Heron Platform

Posted on:2022-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306542455334Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data technology,the traditional batch data processing platforms have been unable to meet the needs of various real-time business scenarios,so streaming computing framework came into being.Streaming computing is widely used in fields with high real-time requirements such as finance,the Internet of things,and social networks.As the application scenarios become more and more complex,the performance requirements for the streaming computing platform itself are getting higher and higher.The streaming computing platform Apache Heron is taken the research objective in this dissertation.The default scheduling strategy used does not take into account the load difference in each working node and the difference in communication overhead between different communication methods.To solve these problems,the architecture design,topology and scheduling mechanism of heron are analyzed,and the job model of Heron is established,including topology logic model,topology instance model and instance allocation model.Considering the resource allocation of work nodes,the resource limitation model is established,and then the communication overhead optimization model is established by analyzing the differences between different communication methods,and the following task scheduling optimization strategies are proposed.By analyzing the transformation relationship of task instance data streams,the instance data streams relationship model is established,and the task scheduling strategy based on instance reallocation is proposed.The strategy includes node resource limitation algorithm and instance reallocation algorithm.The node resource limitation algorithm is used as the decision of instance reallocation.When instance reallocation is performed,it is judged whether the node is allowed to be allocated to avoid node resource overflow during reallocation.The instance reallocation algorithm calculates the maximum internode data stream of the associated instance between nodes.Under the condition of meeting the node resource limitation algorithm,the associated instance between nodes is allocated to its maximum data stream node.While transforming the data stream between nodes into the instance data stream within nodes,it avoids the formation of other larger data stream between nodes,so as to better reduce the communication overhead.Experimental results show that under the three sets of topology tests,the strategy can reduce system latency and communication overhead between nodes and improve system throughput compared with the Heron default scheduling strategy.By analyzing the unbalanced CPU load of working nodes caused by the default scheduling strategy,a node load classification model is established,and a task scheduling strategy based on load-aware is proposed.The strategy includes a node load classification algorithm and a load-aware allocation algorithm.The node load classification algorithm classifies the working nodes according to the CPU load size and the load deviation value of the working nodes.The load-aware allocation algorithm schedules instances in highload nodes to low-load nodes,and when ensuring CPU load balance,the associated instances between nodes are scheduled as much as possible.The experimental results show that under the three sets of topology tests,the strategy is optimized in terms of system latency,CPU load balancing and communication overhead between nodes compared to the Heron default scheduling strategy.
Keywords/Search Tags:stream computing, Apache Heron, task scheduling, communication overhead, load balancing
PDF Full Text Request
Related items