Font Size: a A A

Parallel GPGPU Simultion And A Low-Cost Network On Chip Design

Posted on:2015-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhaoFull Text:PDF
GTID:2348330509960529Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, since GPGPU plays a more and more important role in biological computing, financial analysis, weather forecast and other high-performance computing fields, the research of GPGPU has become a research focus. For the design of simulators,a high performance simulator is of great importance to the research of GPGPU. However,today most GPGPU simulators are serial ones, and have a very slow speed, which greatly limits the application of simulators in the research of GPGPU. And for the design of GPGPU architecture, with the increasing computing ability of GPGPU, more and more computing nodes in GPGPU need communicate with the memory nodes by using the Network-on-Chip(No C). With the increasing cost of No C in the Multi-core system,how to design a Low-Cost No C while keeping the performance is extremely important for the development of GPGPU.For the first problem, we proposes a method to parallel the GPGPU simulation. By making full use of the computing ability of the host platform which contains multi cores or multi compute nodes, the GPGPU simulation can be accelerated. And for the second problem, we proposes a Low-Cost GPGPU No C design. Firstly by using a series of designs, we can aoivd the conflict of message in the network. Then we simplify the microarchitecture of router to reduce the No C cost.The main research work and achievements of this article are listed below:(1) A way to parallel the GPGPU simulation on the platform which contains multicore or multi compute nodes.In the multicore host platform, we propose an intra-kernel parallelization mechanism. On one hand, we employ several simulation threads to simulate Clusters at the same time, while making the simulation threads synchronize in every clock cycle to get a high simulation accurancy. On the other hand, we parallel the functional simulator and performance simulator in the execution-driven simulation to make up the performance lost during cycle-by-cycle synchronization. In the multi-machine host platform, we propose a inter-kernel parallelization mechanism. We divide the kernel functions into several groups and use different compute nodes to simulate different groups in order to improve the performance. In the process of simulation, the GPGPU simulator would rely on the results produced by functional simulator. So the functional simulator would restrict thewhole simulation speed to a certain degree. According to the characteristics of CUDA programming model, we propose a method to parallel the functional simulation and apply it into the intra-kernel, inter-kernel parallelization.(2) A Low-Cost GPGPU Network on Chip According to the communication pattern of request network in GPGPU, we propose a Low-Cost No C design. Firstly, we divide the compute nodes into several groups and use channel-slicing to avoid message conflict between different groups; Secondly, we design a token-control method and tokentransportation network so that we completely avoid the message conflict in the group and we prove that there is no conflict at all in the network; Thirdly, we design a lowcost router microarchitecture to achieve single circle transmission of message in adjacent routers; Fourthly, we design a back pressure network to help the compute nodes get the information about the status changes in the memory nodes. By using these information,the compute node can control and adjust its packet sending policy.We accomplish these ideas and evaluate the system by using many CUDA programs from various benchmarks. The results show effectiveness of the parallelization method and the Low-Cost network design.
Keywords/Search Tags:GPGPU, Simulate, Parallel, Low-Cost, NoC
PDF Full Text Request
Related items