Font Size: a A A

Design And Implementation Of High Performance Computing Platform Based On SLURM Scheduling And Heterogeneous Programming

Posted on:2022-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y S HouFull Text:PDF
GTID:2518306773995789Subject:Information and Post Economy
Abstract/Summary:PDF Full Text Request
High-Performance Computing(HPC)is an important guarantee for national scientific and technological progress.It provides powerful computing power support for scientific computing,scientific research modeling and other scenarios that require a lot of data processing and computing power consumption.With the development of computer hardware,CPU,GPU and FPGA are more and more widely used in software,and cross-architecture development ability is more and more valued.Therefore,it is more and more obvious to design a high-performance computing platform that can run on processors of different architectures.In addition,the work force is an important index of high performance computing,but you would have a high performance computing cluster is powerful force on the effective implementation in various fields has been the major industry,the problem of Internet in recent years the development of science and technology has given rise to a lot of excellent technical tools and modules,but due to the application development more dependent on a single Internet and the hardware vendor enclosed programming environment,Incompatible with each other,resulting in the use of different dependency libraries,computing libraries,operating environments,optimization tools to develop applications brought by the complexity of double increase,but also make the developed product performance decline.This paper starts from the actual needs of HPC platform and the problems encountered at the present stage,such as HPC cross-architecture problems,code base compatibility problems,complex user configuration,single support program,program running performance is not high problems,the system design and implementation.Firstly,a Web service platform based on Slurm cluster resource scheduling is designed and implemented by using the Flask framework of Python.Users can allocate resources in Slurm cluster through Web pages,and submit jobs,view jobs,view platform resource status and other functions through form submission.In addition,because the normal operation of each type of program needs the support of the corresponding operating environment,otherwise it is easy to report errors and affect performance,so this paper integrated the open source toolkit Intel one API through investigation,so that the code compiled on the system has the ability of cross-architecture execution and the basic environment for all kinds of programs to run.Then,in order to make the running environment of various programs easy to use,this paper pre-configured the running environment of MPI,open MP,Tensor Flow,Py Torch,data analysis and other application scenarios by investigating the common requirements of HPC system,so that users can choose directly and do not need to configure a lot.It makes up for the shortage that traditional high-performance computing platform can only submit MPI and open MP jobs.Later in this article,through experimental analysis was carried out on the platform support various types of homework to do the performance optimization,through experimental analysis in different environment variables and parameters configuration under the combination of the efficiency of the program,it is concluded that the optimal operation environment,the environment variables and parameters configuration strategy combination,make the system by optimizing strategies on running program have higher performance than do not use the optimization strategy.Finally,this paper has done the use case test of the system,and the performance analysis.Based on the design and implementation of high-performance computing platform oriented to the Web,this paper enriches the use scenarios of the system and the compatibility of hardware processors on the basis of the high-performance computing platform based on the excellent performance of Intel one API toolkit.It solves the problems of traditional high-performance computing platform,such as relatively few supporting programs,difficult to realize heterogeneous programming,and the need to configure a lot of running environment.Finally,this paper optimized the strategies of environment variables for the programs supported on the platform,so that the efficiency of these programs can be improved at different levels.
Keywords/Search Tags:High Performance Computing, Slurm, Flask, Heterogeneous Parallel Programming, Optimization
PDF Full Text Request
Related items