| In addition to its traditional advantages in scientific computing fields such as ocean simulation,climate prediction,and molecular dynamics simulation,High Performance Computing(HPC)has gradually been widely used in artificial intelligence,nuclear energy simulation,energy exploration,national economic prediction,and decision making in recent years.High-performance computing brings more efficient and accurate data analysis and computing capabilities to these fields,and has become a high point of science and technology that countries around the world are competing to occupy,and is an important symbol of national comprehensive scientific and technological strength.By working together at high speed,each node in a high-performance computing cluster can provide more powerful computing performance than traditional computers or servers,making supercomputers or high-performance computing clusters an essential infrastructure for modern supercomputing centers.With the maturity of cloud computing technology,more and more academic institutions and commercial users are trying to use cloud platforms to support the operation of HPC applications.A large number of studies and practices have proved that cloud computing technology can be a useful supplement to HPC clusters with its advantages in elastic provisioning,light virtualization,and resource control.More and more supercomputing centers have built or will soon build cloud computing clusters with virtual machines or containers as resource scheduling units in addition to traditional HPC clusters to meet more application scenarios.It is foreseeable that heterogeneous supercomputing centers will be more and more widely based on hybrid clusters composed of cloud computing clusters and traditional highperformance computing clusters to provide computing services for various highperformance computing jobs.However,this hybrid cluster-based computing model also faces new challenges: on the one hand,although cloud computing clusters can support high-performance computing jobs,they are still inferior to traditional high-performance computing clusters in terms of performance,and scheduling what high-performance computing jobs to run on the cloud is the key to providing users with the most costeffective computing services.On the other hand,hybrid clusters use different resource management and scheduling technologies and different computing resource configurations.Scheduling HPC jobs to HPC clusters or cloud computing clusters is the key to making the overall resources at the data center level achieve load balancing.Therefore,how to predict and schedule HPC jobs based on hybrid clusters of HPC and cloud computing to ensure HPC users get higher cost performance and improve data center resource utilization is an urgent research and solution in the field of hybrid cluster management for heterogeneous supercomputing centers.To address the above issues,this thesis investigates the perception and scheduling of HPC jobs in hybrid clusters based on the performance characteristics reflected by HPC applications running in physical HPC clusters as well as Kubernetes clusters,in terms of the impact of running HPC jobs in different environments on cluster energy consumption and job execution costs.The main research content and innovation points are as follows:(1)A multivariate feature prediction model for hybrid HPC application scenarios is designed and trained in a physical HPC cluster environment and a Kubernetes cluster environment to predict HPC job demand features,respectively.The model can sense the total time,CPU requirements,and memory requirements required for job tasks to run in a certain cluster environment based on the HPC job requirements submitted by users,and this sensing and prediction feature provides the necessary prerequisites for forming a policy approach for scheduling HPC jobs in hybrid clusters;(2)A hybrid cluster-based HPC job scheduling policy method is proposed,which is based on a multivariate feature prediction model that senses the demand for runtime,CPU and memory resources based on the demand for HPC jobs to obtain the cluster energy consumption and monetary cost required for job tasks to run in each cluster environment,and uses them as policy features to schedule HPC job tasks submitted by users to the data center to the cluster environment that generates relatively less cluster energy consumption and job execution cost;(3)A architecture of job sensing and scheduling based on a hybrid physical HPC cluster and Kubernetes cluster is constructed,and a multivariate feature prediction model and an HPC job scheduling policy approach are functionally integrated to validate the impact of the sensing and scheduling approach proposed in this thesis on resource data centers. |