Font Size: a A A

Research On Optimization Mechanism Of Containerized Spark Resource Scheduling In Cloud Environment

Posted on:2020-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Q SongFull Text:PDF
GTID:2428330590495743Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing and big data technology,container technology and Spark technology have been widely adopted,which makes the traditional Spark deployment mode more and more bloated.The lightweight,easy-to-isolate,out-of-the-box nature of the container has maken developers paying more attention to the exploration of combining big data technology with container technology.The nature of lightweight,easy-to-isolate,out-of-the-box of the container has made developers increasingly paying attention to the exploration of combining big data technology with container technology.Therefore,this thesis proposes a optimization strategy of resource scheduling for containerized Spark cluster,and validates the effectiveness of the optimization strategy through experiments.This thesis mainly includes the following three aspects:1)The traditional container scheduling algorithms and mainstream container orchestration tools mostly focus on the scheduling metrics of a single container when scheduling containers.Therefore,when dealing with cluster container scheduling,they often meet problems such as Single indicator,unbalanced and long time-consuming.This thesis proposes the OABC(Optimised Artificial Bee Colony)parallel scheduling algorithm,which comprehensively considers the correlation among nodes in the cluster and the correlation among each working node and the data source.At the same time,with the plug-in feature of the scheduling module in Kubernetes,the cluster container parallel scheduling strategy is added.The cluster is used as the basic unit of scheduling to shorten the overall scheduling time of cluster containers,with improves service performance.The experiment results show that the proposed algorithm can effectively shorten the construction time of the entire containerized cluster and make the overall load of the cluster more balanced.2)Deploying the Spark cluster in the form of a container changes the traditional way that the host resources are directly utilized,which will result in a loss of performance of the Spark cluster when processing the task.Therefore,this thesis proposes a HPS(Hierarchical Priority Scheduler)scheduling strategy,which layers the active worker nodes based on the characteristics of containerization,the actual processing performance of the host node and the localization priority rules of Spark.When the Task is scheduled,hierarchical priority scheduling based on the layered Worker nodes will be adopted,which minimizes the transmission consumption of data across hosts.The experiment results show that the algorithm can effectively shorten the processing time of the task and improve the overall processing performance of the containerized Spark cluster.3)Aiming at the resource scheduling problem of containerized Spark cluster proposed above,a complete resource scheduling system for containerized Spark cluster is designed and implemented.The system includes task submission module,cluster container scheduling module,Spark resource monitoring and task scheduling module.The test results of system show that through the cooperation of the above modules,the system greatly improves the convenience of containerized big data applications,and can provide an efficient and stable containerized big data resource scheduling solution.
Keywords/Search Tags:Cloud Computing, Container, Docker, Big Data, Spark, Resource Scheduling
PDF Full Text Request
Related items