Font Size: a A A

Non-speculative Parallelism Strategies For Irregular Applications On CMPs

Posted on:2017-12-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:H R YuFull Text:PDF
GTID:1318330485950827Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chip multiprocessors become the mainstream architecture and have a flood of advan-tages, including serious computational power, low power consumption and low designing complexity. It is the irregular applications that have a large amount of complex data flow and control flow, leading to a fact that the ample on-chip resources cannot be used effective-ly. Fortunately, automatic parallelization can resolve this problem successfully, as it has the ability to transform sequential applications into parallel ones. Speculative parallelism is one of the most popular approaches in automatic parallelization. It creates more opportunities for parallelization by executing some instructions in a speculative way. But, speculation may be wrong, so that this approach suffers from recover overhead when mis-speculation occurs. Hence, non-speculative parallelism become another preferable alternative.By far, non-speculative parallelism has been widely used in irregular applications for extracting independent, pipelined, and cyclical multi-threading. However, the current ap-proaches always lead to imbalanced processor workload, poor scalability, and high sensitiv-ity to fluctuations in communication latency. Moreover, non-speculative parallelism utilizes static analysis to detect data dependence, so that a series of problems relative to pointer alias cannot be resolved accurately, which in turn affects parallel efficiency. To resolve those problems, we propose several new effective methods, including:A new latency-resistant algorithm for automatic program parallelization DOcycliicl is developed to resolve the problem of inter-thread communication sensitivity. DOcyclical em-ploys a priority-based dynamic scheduling strategy to reduce the frequency of inter-thread communication, and take communication latency away from the critical paths of parallel ex-ecution. Besides, DOcyclical integrates the scheduling strategy with a node fusion strategy to keep processor workload as balanced as possible. To demonstrate the capacity of DOcyclical, we have evaluated it by using the SPEC CPU2006 and StreamIt benchmarks on three real platforms. Experimental results show that DOcyclial is much less sensitive to fluctuations in communication latency. Besides, DOcyclical outperforms other well-known paralleliza-tion methods, including DSWP, PS-DSWP, and HELIX, in terms of speedup by 21-50%, 16-27%, and 15-25% respectively on the three platforms.A dynamic analysis framework DSspirit is designed to resolve the limitation of the con-servative static analysis. DSspirit performs both data dependence profiling and stride refer-ence profiling. It employs a hash-based scheme to detect actual data dependence, and a value-based scheme to analysis the reusability of a given program to select the most prof-itable loads as prefetched objects for compilers. To demonstrate the effectiveness of DSspirit, we have evaluated it using the SPEC CPU2006, MPI2007 and OMP2012 benchmarks on an Intel i7-4700 machine. Experimental results show that DSspint produces accurate profiling results, including expected data dependence and prefetched objects. Moreover, when DSspirit is used in automatic parallelization, it brings about 20% more performance improvements.A prefetching algorithm is developed for improving the cache hit rate of parallel ap-plications on CMPs. Prefetching has the ability to increase the cache hit rate, as it can bring needed data into the cache ahead of time. To demonstrate the effectiveness of the prefetching algorithm, we have evaluated it on an Intel i7-4700 machine by using the SPEC CPU2006 and MPI2007 benchmarks. Experimental results show that the prefetching algo-rithm reduces the cache miss rate of parallel programs dramatically.A parallel system HSparrallel is designed. The system provides super latency tolerance and high cache hit rate by incorporating DSspirit, DOcyclical and a prefetching algorithm to-gether. To demonstrate the capacity of HSparallel, we have evaluated it on an Intel i7-4700 ma-chine by using the SPEC CPU2006 and MPI2007 benchmarks. Experimental results show that HSparallel brings about a dramatic performance improvements. Besides, DOcyclieal out-performs other well-known parallelization methods, including Paralax and [1,2], in terms of speedup by 19%,21% and 17% respectively.Lastly, it is worth noting that all of the algorithms, framework and system are build upon the LLVM compiler, and add to its back-end.
Keywords/Search Tags:automatic parallelization, optimizing compilers, dynamic analysis, prefetching
PDF Full Text Request
Related items