Font Size: a A A

Research On Load Balancing Based On Multi-tenant Task Scheduling In Storm

Posted on:2019-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:K L ShiFull Text:PDF
GTID:2428330566967189Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet and the application of big data technology,traditional data processing cannot meet people's needs.The distributed processing system represented by Hadoop which solves the problem of dealing with massive data(PB or above),but Hadoop is the processing of historical data.It cannot meet the needs of people to deal with real-time data with high throughput and low delay.Therefore,the flow computing distributed computing system emerges as the times requiring with Storm.Storm is one of the representatives of big data flow computing.At present,many large companies are also using Storm.The scheduling algorithm of Storm has related research in industry and academia.Storm default scheduling uses round-robin scheduling strategy to allocate tasks to each worker node,but the default scheduling random allocation topology-tasks does not take into account the CPU load of each worker node,and the communication delay in inter-node,inter-process and inter-thread.At present,the research on Storm related scheduling strategy includes online scheduling,offline scheduling,resource aware scheduling,traffic aware scheduling,workload scheduling and so on.The research on the above scheduling strategy is aimed at a single topology,and does not propose the research on the submission of multiple topologies.Due to the random allocation of topology tasks in the default scheduling,the submission of multiple topologies can easily lead to the unbalance of the slot occupied in the worker node,and the imbalance of the task threads of the load in each slot.To solve this problem,this dissertation proposes multi-tenant slot scheduling and research strategy of task thread scheduling.The multi-tenant load balancing scheduling strategy in this dissertation mainly solves the following problems:(1)the CPU load balancing problem of the work node;(2)the slot allocated to the job topology for each work node is relatively balanced and cannot exceed the number of the maximum allocated slot for each work node;(3)the load of the occupied slot in each working node is relatively balanced;(4)reducing communication delay inter-node,inter-process and inter-thread,and increasing throughput.In view of the above problems,the main solutions in this dissertation are as follows:(1)Sorting the work nodes according to theform of the queue and assigning priority to each node's supervisor-id,the higher the priority is,the higher the priority is,the more the priority is allocated to the topology task;(2)The real-time and more new topology information is used to obtain the slot of the worker node.The lower the occupancy of the worker node's slot,the higher the node priority;(3)To calculate the message transmission between the task threads in each slot,transfer the task according to the hot edge between the threads,ensure the load balance of the slot and improve the throughput;(4)Reducing the communication delay between the processes and the nodes through the migration of the threads.(5)By controlling the load threshold of the CPU of the worker node to control the maximum load of each working node CPU.Experiments are based on benchmark to submit four topologies of T1,T2,T3 and T4.The experiment shows that the proposed multi-tenant task scheduling strategy improves the throughput of the data flow by24.2%,and the delay reduction by 29% and the CPU load lower by 15.1% compared with the default scheduling for the 4 job topology based on benchmark test in the environment of 4working nodes in cluster.
Keywords/Search Tags:stream computing, Storm, data processing, scheduling
PDF Full Text Request
Related items