Font Size: a A A

Research On Two-stage Task Scheduling Of Distributed Stream Processing System

Posted on:2020-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:H P JieFull Text:PDF
GTID:2428330599458996Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today's society has already entered the digital age,generating massive amounts of data every day.How to deal with big data is a serious challenge for existing computers.Because the massive amount of data is constantly being processed by the computer,it is difficult for the system to respond in time.Whether it is the real-time hot data detection of weibo and twitter,or the stock trading of Wall Street and nasdaq,these application scenarios are in urgent need of the support of real-time streaming computing systems with low latency.Existing real-time stream processing systems include Storm,Heron,Flink,etc.These systems generally adopt a round-robin task scheduling strategy which does not take into account the communication delay and will increase the system delay in most cases.When considering communication delays,the scheduling system places task instances that communicate with each other on a node.When the amount of data source data is large,this method has the problem that the load on some nodes is too heavy to increase the calculation delay.The system delay consists of communication delay and computation delay.How to comprehensively consider these two delays is a problem to be solved.In order to solve this problem,a two-stage scheme is proposed.The first stage is initialization.According to the topology submitted by the user,a static scheduling with less data from the data source is proposed and a scheduling set is provided for the second stage.In the second stage,the system starts to run,and a dynamic scheduling method based on reinforcement learning is proposed.Experimental results show that the two-stage scheme can guarantee the minimum delay when dealing with large-scale streaming data in the face of dynamic changes of data sources.Experiment results show that compared with Storm,the average tuple processing time is reduced up to 41.2\% and 31.9%.
Keywords/Search Tags:Big data, Stream processing, Reinforcement learning, Task scheduling, Realtime
PDF Full Text Request
Related items