Font Size: a A A

Resource Management For Big Data Analytics Systems

Posted on:2020-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D ZhangFull Text:PDF
GTID:1368330578465574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,big data analytics and artificial intelligence techniques have driven advances in many different fields.One of the reasons for this success is the development of software platforms and computation frameworks that enable the easy use of large amounts of computational resources.Since the data and computation complexity challenge the hardware limits,resource management is crucial in the shared big data analytics systems.Most of the existing works have strong assumptions for application/job characteristics,and ignore the possible regulatory constraints and dynamics in execution environments.First,the unpredictability in practical executions makes the exposed characteristics of jobs inconsistent with the foreseen characteristics,which leads to the suboptimality of clairvoyant scheduling methods.Second,more and more countries and regions are establishing laws to restrict computation resources controlled from remote(and possibly untrusted)parties,which prevents the centralized architectures.Third,the dynamics in execution environments(i.e.WAN bandwidth)exclude the scheduling algorithms which operating based on the constant environments.Based on these understandings,we study the resource management problem in different scenarios,and propose job scheduler and task scheduler accordingly.The main contributions of this dissertation are summarized as follows:(1)For the resource management problem in a single data center,we propose semi-clairvoyant task scheduler.First,we explain the necessity of "semi-clairvoyance",and formally define it.Then,we design the task scheduler COBRA,which operates in each job,and manages resources and schedules tasks for the job.Specifically,when managing resources,COBRA differentiates three cases where a job either requests more resources,or maintains current resources,or proactively releases some resources.The decision is made according to resource utilization feedback and current waiting tasks.When scheduling tasks,COBRA strives to satisfy data locality.Theoretically,we prove the efficiency of COBRA.We implement COBRA in Spark on YARN system,and use experiments of both real systems and simulations on Google trace to verify the improvement of job performance.(2)For the resource management problem in a single data center,we move forward to the practicability,and propose non-clairvoyant job scheduler.First,we analyze the job trace from a large cluster,and observe that the resource usage of jobs follows the heavy-tailed distribution.Based on the existing method,we define Cumulative Running Work(CRW),which is a good predictor of a job's original work.The proposed CRWScheduler schedules jobs based on CRW.CRWScheduler combines the CRW-based heuristic(better for heavy-tailed job distribution)and FIFO-based method(better for light-tailed job distribution)via weighted multi-queue framework.Each job is assigned to a queue according to its CRW.The queue with less CRW jobs is set lower weight.When scheduling resources,CRWScheduler takes into account both jobs' current resources and their queues.We implement CRWScheduler in Apache Hadoop YARN,and test its performance improvement in a 28-node cluster.(3)Geo-distributed data analytics are increasingly common to derive useful information in large organisations,but this requirement faces unique challenges including regulatory constraints,WAN bandwidth limits,and high monetary costs.Naive extension of existing cluster-scale systems to the scale of geo-distributed data centers fails to meet regulatory constraints.Our goal is to develop an regulation-abiding data analytics system that can guarantee efficient geo-distributed job performance economically.To this end,we present HOUTU,which is composed of multiple autonomous systems,each operating in a sovereign data center.HOUTU maintains a job manager(JM)for a geo-distributed job in each data center,so that these replicated JMs could individually and cooperatively manage resources and assign tasks.We bulid HOUTU by Spark,YARN and Zookeeper as underlying blocks.Our experiments with typical workloads on the prototype running across four Alibaba Cloud regions show that HOUTU achieves around 30%performance improvement compared to the existing centralized architecture,but with only a quarter of monetary costs.We also theoretically prove the proposed methods guarantee O(1)-competitive ratio in terms of makespan when jobs arrive in an online manner.
Keywords/Search Tags:Distributed System, Resource Management, Job Response Time, Makespan
PDF Full Text Request
Related items