Font Size: a A A

Research On Job Scheduling Algorithms For Online Stream Processing With Low Latency

Posted on:2020-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:X WeiFull Text:PDF
GTID:2428330575977780Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the coming of the era of big data,real-time processing of increasingly large data becomes more and more important.In order to meet the real-time requirements and ensure the stability of data stream processing,many enterprise users adopt various distributed stream processing system architectures or platforms.The basic function they provide is to schedule the tasks of stream processing applications to the currently available physical resources and route data between these resources.For many distributed flow processing frameworks,how to schedule tasks in applications to physical clusters is one of the basic problem to be solved.At present,the scheduling algorithms of distributed stream processing system about latency constraints pay much attention to computing-intensive scenarios.For system latency,the resource required by tasks and the processing time of tasks are taken into account,while the effect of inter-task transmission time on latency is ignored.Moreover,the default scheduling strategies of many stream processing systems lack intelligent scheduling mechanisms,ignore the perception of task resource requirements and physical cluster resource availability,and can not efficiently implement scheduling strategy to improve the utilization of cluster resources.Therefore,for the stream processing system,how to minimize the use of physical cluster resources is an urgent problem to be solved while meeting real-time and resource requirements.In addition,in the process of running the stream processing system,the rate of data flow arrival is dynamic.When the rate fluctuation is large,the task in the application may overload,resulting in the increase of system latency,which violates the real-time requirements.How to adapt to the impact of data rate fluctuation is also the main problem of this paper.In this paper,in order to meet the real-time requirements of stream processing systems,firstly,the task scheduling problem is modeled as an optimization problem to minimize the use of resources,which takes into account the availability of computing resources in clusters,the workload of tasks,the characteristics of task nodes and the transmission latency between tasks.To solve this optimization problem,two heuristic stream processing scheduling algorithms,AHA and PHA,are proposed.Based on the topological structure of streamprocessing applications,AHA scheduling algorithm is dedicated to reducing the transmission latency in stream processing systems.PHA scheduling algorithm analyses the impact of critical path latency on overall system latency,and guarantees the real-time requirement of the system by adjusting the latency of critical path.In the simulation experiment,three types of stream topology are adopted,and the feasibility and accuracy of these two heuristic algorithms are verified by comparative experiments.The experimental results show that the proposed heuristic scheduling algorithm can guarantee the real-time performance of stream processing jobs and use fewer cluster resources.Secondly,in view of the rate fluctuation characteristics in the process of stream processing,this paper formalizes the waiting time and task processing time of operator tasks in stream processing applications into a latency prediction model by using queuing theory.On this basis,the minimum parallel number of operator tasks is predicted according to the properties of the convex function of the latency prediction model.Furthermore,the scheduling problem caused by dynamic rate changes is modeled as a minimization optimization problem,and a prediction algorithm for the minimum number of parallel tasks of operators and a dynamic resource scheduling algorithm DST are proposed and designed.The main goal of DST scheduling algorithm is to use fewer cluster resources under the condition of satisfying user latency constraints.In addition,the algorithm can trigger execution scheduling when the stream processing application reaches critical triggering conditions or meets the time scheduling cycle.Moreover,when the average arrival rate of data flow is low,it can integrate resources in order to improve the efficiency of resource utilization.Finally,the dynamic resource scheduling algorithm DST proposed in this paper is tested and evaluated by simulation experiment.In the experiment,two kinds of data,real data set Memetracker and simulated data set,are used,and the two algorithms are compared and tested.Through the analysis of experimental results,it is found that DST scheduling algorithm can better adapt to the change of data flow arrival rate,and has remarkable effect in meeting latency constraints and improving resource utilization.
Keywords/Search Tags:Transfer Latency, Task Scheduling, Distributed Stream Processing, Latency Constraint, Critical Path, Queuing Theory
PDF Full Text Request
Related items