Font Size: a A A

On GPU Resource Management And Scheduling Extension For Spark Platform

Posted on:2018-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y SongFull Text:PDF
GTID:2348330521950286Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology and the value of data mining lead a significant need of an efficiency analytical tool to deal with the large-scale data.Nowadays,large-scale data computing framework is developing rapidly.This trend includes traditional computing method that based on Grid computing and hard disk,as well as the burgeoning computing method based on HDFS and memory.Because of the limitation of traditional parallel programming model,the new parallel programming frameworks have to become more powerful to adapt the complex programing method.Therefore,the purpose of this project is that extending the range of application and enhance universality about existing system,and making it can suitable for more application scenarios.In addition,based on this,another goal is improving the efficiency of large-scale float computation and making the system more stable.The chiefly task of this research involves two aspects.The first job is making GPU as a new kind of resource works in resource manager,and making the GPU application can directly run on the Spark computing platform.Moreover,based on this phenomenon,designing a general organization pattern to make the Spark more easily to extend on more kind of group manager and secondary development,enhance management function and the computing framework compatibility.The design of this thesis is based making full use of the Spark Platform's advantages,and embed the GPU tasks reasonable into Spark framework.Meanwhile,strengthen the function of scheduler module and resource module in the original platform,to make it can be competent to allocate,reclaim and release resources under the circumstance of multiple resource type,and improve the resource utilization.At the beginning of the project,analyzing and guaranteeing the value of this topic is necessary.Therefore,considering the design document about the scheduler of computing platform is the first step.In addition,through the research of resource manager which designed by IBM early enterprise large-scale data computing framework and the study of Spark platform by open resource community or other research institutions in recent years to analyze the feasibility of the design.Then,use the method of function simulation,design project demo and simulating the running environment to figure out a appropriate development document.Realizing the functions of resource scheduling,design of running environment and improve the function logic to make the Spark more useful and stable.This Thesis describes the design of this project.This framework involves Scala and Python coding environment,and set up on the EGO.The GPU programs and the related library files are involved into the Spark framework.Ultimately,fulfill the parallel computing of large-scale float data,improve the data throughput,and put forward two kinds of implementation scheme,which can suit for different environment resources scheduling module.From the test result,it is obvious that the GPU application can work well on the Spark platform.In addition,it can cooperate with the CPU resource;to some extend this cooperation can improve the overall working efficiency and increase the utilization rate of resources.Meanwhile,it also can make the system more stable.
Keywords/Search Tags:Spark, GPU, Resource manager, Task scheduler
PDF Full Text Request
Related items