Font Size: a A A

Research On Key Techniques Of Deterministic Multiprocessing Targeting Multicore/manycore Architectures

Posted on:2014-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ZhouFull Text:PDF
GTID:1108330479979645Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multicore processor has taken the place of uniprocessor to become the mainstream of today’s CPU. In the future, CPU will accommodate more cores and make us enter a many-core era. However, as parallelism increases, the cost for software development and maintenance is also increased obviously. Due to the factors such as data race, synchronization race and message race, the execution of a parallel program will produce different results in different runs. This nondeterminism is the natural property of parallel programs, which makes many work in parallel programs, such as programming, debugging, testing, intrusion analysis and fault tolerance, much more difficult than their sequential counterparts.To solve this problem, researches propose the technique of deterministic multiprocessing recently. The ultimate aim of deterministic multiprocessing is to ensure that execution of a parallel program will always produce the same output if provided with the same input. Hence, the execution of parallel programs will be simplified to the level of sequential programs, which will greatly reduce the cost for parallel program development and maintenance. However, current deterministic multiprocessing systems are still not practical due to many problems. For example, hardware implemented systems are poor in portability. Moreover, there is no such real hardware both in commodity products and in laboratory. Currently, hardware-implemented results are all derived from simulation. On the other hand, pure software implementations have prohibitive overhead to prevent them from being practical in use. Moreover, deterministic multiprocessing also have problems in scalability, stability and determinism. The purpose of this thesis is to study the technique of deterministic multiprocessing for multicore/many-core architecture, and implement a practical and efficient deterministic multiprocessing system. To this end, we conduct three pieces of research contents for both the thread-level parallelism and process-level parallelism.1. Fully Parallelized Deterministic Multiprocessing TechniqueCurrent deterministic multiprocessing for thread-level parallelism(also known as deterministic multithreading, abbreviated as DMT) is often inefficient due to low parallelism. Hence, we propose a fully parallelized deterministic multithreading technique. This technique solves a key problem, i.e., without weakening determinism and memory consistency, how to maximally exploit the parallelism of a deterministic multithreading system? To this end, we introduce deterministic synchronization points and use memory ownership redistribution in these points to solve nondeterminism. Different from conventional approaches that use serial execution to solve nondeterminism, our approach does not introduce explicit serial execution. Hence, our approach could exploit parallelism which was missed by conventional approaches. Furthermore, we dynamically adjust the density of inserted synchronization points to make it adapt to the varying communication rate between threads. Hence, we could balance the overhead of synchronization points and the benefits of exploiting parallelism. We implement this idea as a Fully Parallelized Deterministic system(FPDet). The evaluation results show that, compared with the state-of-the-art DMT system which also ensures strong determinism and sequential memory consistency, FPDet could improve the performance by 40%.2. Deterministic Message Passing TechniqueThe study of deterministic message passing is for the process-level parallelism, and it is supposed to ensure determinism of a group of collaborating processes. Message Passing Interface(MPI) is a common model for multi-proccess parallelism. To solve the nondeterminism of wildcard receiving operations and asynchronous transmission operations in MPI, we introduce logical time to control the processes of message passing, and make them not expose nondeterminism to user applications. Based on logical time, we develop a deterministic waiting mechanism for asynchronous transmissions and a determinism mapping mechanism for wildcard receiving operations to solve their nondeterminism respectively. We improve these two deterministic mechanisms by using a buffering strategy and a deadlock checker to mitigate and solve their performance and deadlock problems. We implement a Deterministic Message Passing Implementation(DMPI) based on MPICH2. The evaluation on NPB benchmarks shows that the performance loss is only 14%.3. Deterministic Multithreading without Global SynchronizationCurrently, all the DMT systems that provide strong determinism use inserted global synchronizations to carry out their deterministic strategy. However, global synchronization is not a good approach as it will cause correctness problem and performance problem. To solve this problem, we propose a Deterministic Lazy Release Consistency(DLRC). DLRC disables the direct shared memory communication of multicore architecture and delay the visibility of memory writes to thread synchronizations. DLRC leverages the happens-before relation caused by the inherent thread synchronization in user programs to perform thread communication, and ensures that a memory write is visible if and only if it happens before the current executing instruction of the thread. As a result, we limit the conventional data races in the points of synchronization operations and make the determinism of programs only depend on the synchronization order of the program execution. Meanwhile, we do not introduce any extra synchronization. Based on DLRC, we leverage weak determinism(an existing algorithm to ensure a deterministic synchronization order) to eliminate the nondeteminism of synchronization race. Hence, we implement RFDet, a strongly deterministic multithreading system without global synchronization. The evaluation results show that RFDet could nearly double the performance of a state-of-the-art DMT system—DThreads.Currently, deterministic multiprocessing is still a hot researching topic. This technique is becoming mature and practical. Our research could simultaneously support thread-level parallelism and process-level parallelism, thus we provide a suite of deterministic solutions for multicore and many-core architecture. Our evaluations prove the effectiveness of our approaches. Through these studies, we advance the technique of deterministic multiprocessing one step forward, and greatly improve the performance and practicability of current deterministic systems.
Keywords/Search Tags:determinism, deterministic multiprocessing, deterministic multithreading, deterministic message passing, debug, memory consistency, multicore, many-core, data race
PDF Full Text Request
Related items