Font Size: a A A

Optimization Techniques For Parallel Discrete Event Simulation On CMP+GPU Heterogeneous Computing System

Posted on:2014-04-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J TangFull Text:PDF
GTID:1108330479979578Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of simulation, the scales grow larger and larger and the models become more and more complicated, which cause an increasing requirement for computation power. How to improve the simulation performance, to satisfy the requirement of running-time limit, is a challenging but important problem. In recent years, heterogeneous architecture has become a trend of constructing high performance computers. Comparing to the traditional high performance computer, the CMP+GPU heterogeneous computing system exhibits the advantages of increased computation power and relatively low cost, which brings an opportunity to accelerate simulation in an economical way. However, the current parallel simulation kernels are oriented to symmetric multiprocessing system or clusters. Such designs, ignoring the characteristicsof multicore/many-core and heterogeneity, cannot effectively utilize the computation resources of CMP and GPU. Therefore, the simulation kernels cannot take full advantage of CMP and support discrete event simulation on GPU. Moreover, it meets the problem of inefficiency when CMP and GPU do simulation collaboratively.In order to solve the problems, by deep analyzing of architectures of the processors and characteristics of discrete event simulation, this dissertation investigates optimization techniques to leverage the power of CMP and GPU to accelerate the simulation. The innovations of this thesis are as follows.First, a hierarchical parallel simulation kernel is proposed. Current parallel simulation kernel utilizes multicore resource by multiprocess paradigm, which leads to inefficiency on synchronization and communication on CMP. To solve this problem, this thesis proposes a hierarchical parallel simulation kernel(HPSK), which enables scheduling logical processes and executing events in parallel with multithread paradigm. Based on the kernel, two services, time management and event management, are optimized to support high performance. First, a protocol to compute earliest emitable timestamp is proposed based on hybrid time management. In the protocol, the synchronization is separated into two states, preparing and calculating. Threads participate in synchronization asynchronously, and only record the timestamp of messages sent to calculating state, thereby support efficient synchronization. Second, an event management algorithmis proposed based on characteristics of events interaction, which can create events lock-free, commitevents asynchronously and transfer events based on pointers, to eliminate the overhead of locks and to reduce the usage of memory. Experimental results show that the HPSK works well on different conditions. Particularly, it can run 8x faster than the simulation kernel based on multiprocess when event locality and lookahead is low.Second, a memory management algorithm is proposed to support discrete event simulation on GPU. Because of the dynamic and irregular needs of memory, it is difficult to store events in a regular data structure on GPU. Meanwhile, the massive concurrent requests of memory may lead to a lot of conflicts among threads. Such factors make the memory management difficult. To solve this problem, this thesis proposes an access map based memory management algorithm. The algorithm uses an access map, which has the characteristics of “Injective” and “Equal probability”, to assign a unique access entry of memory for each GPU thread, thereby support massive concurrent requests. It can be proved that the “Equal probability” characteristic makes the events be stored evenly. The experiments demonstrate that the algorithm can reduce the memory consumption while improve the performance.Third, an expansion-aided synchronous conservative time management algorithm is proposed. GPU’s performance relies on high parallelism, but using synchronous conservative time management algorithm for discrete event simulation will meet the scenarios with limited parallelism, which will lead to bad performance. To solve this problem, this thesis proposes an expansion-aided synchronous conservative time management algorithm. It uses runtime information to enlarge the time bound of “safe” events, and uses an expand method to import “safe” events. By introducing a parameter to control the number of expansions and using the simulated annealing algorithm to the fine optimal value of parameter at runtime, the algorithm can strike a balance between high parallelism and overly-expansion. Experiments demonstrate that the proposed algorithm can achieve up to a 30% performance improvement comparing to synchronous conservative time management algorithm.Fourth, a cooperation mechanism to support collaborative simulation on CMP and GPU is proposed. Since the GPU kernels must be controlled by a CPU thread, which adds extra workload on such thread, thereby cause the imbalanced workload among CPU threads. Meanwhile, the overhead of message transferring can easily lead to bottlenecks. To solve this problem, the thesis proposes a cooperation mechanism to support collaborative simulation on CMP and GPU. The mechanism separates the task of processing events and scheduling, and let CPU threads obtain the control of GPU dynamically to share the workload; Meanwhile, a 2 level buffer is proposed to reduce the overhead of communication, in which the L1 level buffer handles concurrent memory operations and L2 level buffer stores the valid messages after using the parallel scan to eliminate the useless data. Experiments demonstrate that the cooperation mechanism can utilize the heterogeneous computing system effectively and scales well. Comparing to using 1 core of CPU, it can achieve 24.87-67.34 x speedup when using 4cores CMP and 1 GPU.Finally, a heterogeneous computing system based modeling and simulation framework is designed and implemented based on YH-SUPE PDES environment which is developed by our team. A civil violence simulation is used as an integration testing of the optimization. Comparing to using 1 core of CPU, our optimization can achieve a 30 x speedup at least when using 4cores CMP and 1 GPU.
Keywords/Search Tags:parallel discrete event simulation, CMP, GPU, Heterogeneous Computing System, memory management, time management, cooperation mechanism
PDF Full Text Request
Related items