Font Size: a A A

Research On Performance Evaluation And Optimization For CPU-GPU Heterogeneous System

Posted on:2012-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ChengFull Text:PDF
GTID:2218330362460164Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the national economy and technology, the performance of HPC (High-performance computer) is putting forward for higher requirements. The traditional method to develop HPC through pure CPU faces tremendous challenge in energy consumption, heat dissipation and cost. Heterogeneous architecture combines the advantage of general processor and accelerator, and gradually becomes the mainstream architecture in HPC domain. GPU has been widely used in developing heterogeneous computer systemfor its great advantages in performance, storage bandwidth, power consumption and programmable properties. Since the appearance of CPU-GPU heterogeneous system, it has gotten great attentions by the international academic, and has been considered to be an important direction for developing HPC in the future.As an effective way to build HPC system, CPU-GPU heterogeneous architecture can attain high computing power, and people are also concerning about the application of heterogeneous structure. This paper mainly concerns the performance evaluation of the CPU-GPU heterogeneous computer system.We find the key factors restricting the performance of heterogeneous systembyrunning Benchmark suites, and put forward the corresponding optimization methods and validate the availability of the optimization methods by realizing a typical scientific application (Matrix-Multiplication).We mainly did the following researches:(1)We evaluate the comprehensive performance of general CPU system in CPU-GPU heterogeneous system. In this paper, the HPCC (High Performance Computing Challenge) benchmark suite is introduced to evaluate the TH-1A from the points of speed, network communication and memory access. Through the analysis of the HPCC test results and comparison with other HPC's results, we find that TH-1A has favorable overall performance and scalability, and achieve a good balance of computation and communication ability.(2)We evaluate the performance of CPU-GPU heterogeneous system with SHOC (The Scalable Heterogeneous Computing) benchmark suite. SHOC contains several typical parallel algorithms. Through evaluating the performance of them in the system, we find that the cost of data transmission between CPU and GPU in heterogeneous system is an important factor in affecting performance, and then analyse the specific factors of the cost for each algorithm.(3)We put forward a CPU-GPU communication optimization method based on the division of sub flow. According to the SHOC test results, data transmission between the CPU and GPU has a greater influence on the performance of the heterogeneous system. Therefore, we put forward a CPU-GPU communication optimization method by dividing data flow submitted to GPU into several sub flows so that the computing and communication between sub flows can be done simultaneously, and hide the data transmission cost between CPU and GPU.(4)We put forward an adaptive CPU-GPU task partitionmethod. When all the computing tasks are assigned to the GPU, CPU is idle in most of the time, which will result in the waste of computing resources. A kind of adaptive CPU-GPU task partitioning method are put forward by letting CPU undertake part of computing task. This method distributes task reasonably to make theload between CPU and GPU as balance as possible and effectively use the computing resources in the heterogeneous system.(5)We realize the optimization of a typical scientific application. The matrix-multiplication is transplanted into TH-1A.The communication optimization method and task partition method proposed in this paper are used to optimize this program. The test results prove that the performance of matrix-multiplication based on these two methods can respectively increase 5% and 8% when compared to the matrix-multiplication in CUBLAS library.
Keywords/Search Tags:CPU-GPU Heterogeneous System, GPU, Performance Evaluation, Performance Optimization
PDF Full Text Request
Related items