Font Size: a A A

Research Of Software Speculative Parallelization Based On Transactions

Posted on:2010-01-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q ZhangFull Text:PDF
GTID:1118360305973641Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The peak performance of a Multi-core computer doubles each time as the number of cores doubles. The past few years mark the start of a historic transition from sequential to parallel computation in the processors used in most personal, server and mobile computers. Few programs today are written to exploit parallelism effectively. Achieving the peak performance of a mutli-core computer requires a program execute in parallel and scale as the number of processors increase, and it is very difficult for a real-world application to achieve such performance. Because mainstream programming was sequential programming, most existing programming languages, libraries and design patterns do not address the challenges of parallel programming. A primary challenge is to find better abstractions for expressing parallel computation and writing parallel programs to exploit the computing power of the many-core system effectively.One of the challenges is how to ease the process of parallel program development. Lock-based synchronization which has been used by programmers to synchronize concurrent access to shared data, however, has well-known pitfalls:using locks for fine-grain synchronization and composing code that already uses locks are both difficult and prone to deadlock. Transactional Memory provides a high-level abstraction that can greatly simplifies conceptual understanding of parallel programs and provides an optimistic concurrency control mechanism which avoids these pitfalls and significantly eases concurrent programming without the hard problem of coordinating interactions between concurrently executing tasks, and thus was considered as a promising method of parallel programming.The second challenge is how to extract parallelism from single-threaded programs without programmer intervention and improve the efficiency as much as possible. As the natural extension of existing automatic parallelization techniques, Speculative Parallel Threading (SPT) can provide significant sources of additional thread-level parallelism, especially for irregular applications that are hard to parallelize by conventional approaches.Focusing on how to exploit computing power of many-core processor effectively, this thesis studies some key technologies in implementing Software Transactional Memory and Speculative Parallel Threading, and at last we presents TRANSPECT, a parallel programming environment for parallel programming with TM and SPT. TRANSPECT extends OpenMP, a widely used API for shared-memory parallel programming, with a set of compiler directives to express non-blocking synchronization and speculative parallelization.This paper presents a data dependency set based model of transactional memory that shows all modifications to shared data by a STM are linearizable. Based on the model, a new lightweight algorithm named LDSTM is presented here that the version information of shared data is kept in a Directory buffer which can be used to verify whether the view observed by a transaction is consistent at each data item access quickly. The new algorithm can greatly reduce the cost of conflict detection and validation. Experimental results show that LDSTM is simple, and substantially reduce the cost of conflict detection and validation. Comparison tests show that LDSTM is up to 1.53 times faster than RSTM.In a typical STM system, a basic assumption is that all data shared by concurrent threads must be encapsulated within transactions, but it is not practical in real-world application. As researchers are coming to discover, the actual semantics offered by current implementations are often counterintuitive-programs that look "obviously correct" may behave in unexpected ways. The crux of the problem is that implementations do not detect conflicts between a transaction running in one thread and non-transactional steps of another thread. In this thesis we examines this problem, and investigate the use of Causal Consistency with weaker semantics and present a new algorithm named CCSTM based on Causal Consistency that provides a good trade-off between semantic simplicity, good practical performance, and the flexibility of language memory model. The results from STMBench7 tests show that CCSTM can reduce the abort rate of transactions from 89.4% to 81.3% with 4 concurrent threads running.In this thesis we present a Loop level Speculative Multithreading (LLSM) algorithm which partitions loops into speculative threads, with special emphasis on loops for which conventional parallelizing approaches fail in. Inter-iteration dependences in the loop, which become inter-thread dependences at run time, must to be respected to enforce the correctness. To achieve good performance, the management of inter-thread dependences is crucial. In LLSM, W-W and R-W dependency are maintained by buffering, managing and committing multi-version speculative memory state. W-R dependence violation is detected at commit time by checking read-data version. LLSM has obvious advantages over LRPD, which is a well known Speculative algorithm. Experimental results show that LLSM is simple, and can substantially speedup the sequential program. The kernel test of 183.equake and177.mesa in SPEC shows that the minim speedup is 1.57.Speculation allows for maximum parallel overlap when it succeeds in LLSM, but becomes costly when it fails. Inter-thread value communication, on the other hand, introduces a fixed cost regardless of whether the dependency actually occurs or not. In this thesis, we purpose an inter-thread value communication mechanism between the source thread and target thread for inter-thread data-dependency. If inter-thread value communication is necessary in case of W-R dependence, the compiler inserts the corresponding signal and wait instructions, creating a point-to-point path to forward the values involved in the dependency, thus reduced the overhead in case of inter-thread dependencies detected.Data locality is one of the key factors in affecting the performance of parallel programs. In this thesis we present TRANSPECT, an extension of OpenMP for the Cell architecture that can exploit memory hierarchies with data locality at a high abstract level. The Cell processor is a heterogeneous multi-core processor with one Power Processing Engine (PPE) core and eight Synergistic Processing Engine (SPE) cores. Each SPE has a directly accessible small local memory, and it can access the system memory through DMA operations. The time to access local memory is much less than the time to access system memory. The experimental results on Cell show that TRANSPECT is effective.
Keywords/Search Tags:Mutli-Core, Many-Core, Software transactional memory, data dependence, Speculative parallelization, Conflict detection, Scratchpad memory
PDF Full Text Request
Related items