Font Size: a A A

Task Scheduling In Distributed Stream Processing Systems

Posted on:2018-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ChenFull Text:PDF
GTID:2348330512484593Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of big data processing,distributed stream processing systems have been widely applied in many data processing applications.Small companies and research institutions can enjoy the fast speed brought by cloud computing in the way of renting services or resources,even though they cannot afford the cost of a whole cluster.Distributed stream processing platform can also be provided as one of the cloud services.For a given stream processing job,users want to run the job with minimal resources.which can help them to reduce the cost,while providers want to run as many jobs as they can in a distributed cluster with finite resources.Based on this scenario,this paper first introduces and compares the scheduling algorithms proposed on distributed stream processing systems.Then we focus on the scheduling objective:How to minimize the resources usage while satisfying the job's requirements.We come up with new algorithms to solve this.First,we collect the statues in the processing of streaming jobs,and try to predict the data volume and resources usage.According to the prediction,we can calculate the average processing time,and dynamically apply a new task reassignment if needed.The algorithm we proposed can achieve:1.For a given streaming job,we can minimize its resources usage while satisfying the requirements.2.With the dynamic status collection and analysis,we can dynamically adjust the task assignment with the respect to processing time.Furthermore,we design and implement a prototype system on Storm platform.We designed two topologies for experiment based on real-world dataset.In the experiments,we implement another naive algorithm,and we compare our algorithms with Storm's default scheduler.The results show that in the real-world scenario,our algorithms can work efficiently to minimize the resources usage.
Keywords/Search Tags:Stream processing systems, distributed systems, task scheduling, Storm, Big data processing
PDF Full Text Request
Related items