Font Size: a A A

Performance Optimization On Multicore Transactional Memory Architecture Supporting Speculative Parallelization

Posted on:2011-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B WangFull Text:PDF
GTID:1118360305966714Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the popularity of multicore platforms, how to make use of multicore computing resources to accelerate the traditional serial applications has become a common concern problem. The traditional explicit lock synchronization mechanism has its natural defects in the complexity, fallibility and conservative performance, so that fundamentally limits the scalability and efficiency of parallel programing, but also limits the full use of the multicore resources. In order to develop more thread-level parallelism from multi-core architecture, using Transactional Memory (TM) technology to solve the traditional parallel programming complexity and its constraints on performance, has become academia and industry consensus. This paper takes how to develop effective thread-level parallelism from applications as the point of departure, aims at three goals of high-performance, easy programming and compatibility, and coordinates hardware and software to make depth research on multicore transactional memory architecture supporting speculative parallelization. It can raise the effective utilization of multicore computing resources, reduce the difficulty of parallel programming effectively, and make the smooth migration of traditional applications.This dissertation carries out in-depth systematic study in the views of both thread partition and thread execution in the multicore transactional memory architecture supporting speculative parallelization, involving structural model, programming model, performance analysis model, thread partition guided by offline profiling and thread execution guided by online profiling aspects. The major research contributions include:(1) based on a survey on two main parallel speculative thread-level parallel technological trends in detail and a comparison between their software and hardware support mechanism, a novel hardware and software co-designed multicore transactional memory architecture is proposed. It uses software thread-level speculation ideas to guide the thread partition and hardware transactional memory technology to support thread execution, coordinates hardware and software elements by offline and online profiling technology, and achieves the goals of both improving applications'performance and reducing the parallel programming's difficulty. (2) In the optimization aspect of software thread partition, aiming at simpler parallel programming and improving the parallel execution performance, a set of criteria, research methods and profiling mechnisms for speculative thread-level parallelism are proposed. An offline profiling guiede thread partition scheme for transactional memory is determined. And a set of offline profling tools named Openpro is developed to exploring the thread-level parallelism based on the criteria. (3) It analyzes the key thread-level parallelism performance impacting factors in desktop, media and HPC fields and makes an investigation from the view of the applications' own parallelism potential. (4) In the aspect of hardware thread execution, aiming at good scalability and easy to implement, a priority determination supporting directory-based cache coherence protocol is proposed. Based on this, a scalable distributed multicore transactional memory processor hardware simulator PTT is developed. It can support both thread-level speculation and transctional memory semantics by run-time libraries supporting mechanisms. This design breakthrough the limitations on hardware scalability in the past brought by the centralized structure mechanisms, such as bus architecture, and achieve both of the good scalability and easy-to-hardware-design goals. It uses positive version management and active violation detection mechanism in this distribute transactional memory system, so that the system will automatically maintain the consistency of the hardware system and greatly reduce the complication and complexity of parallel programming work. It has very important meaning on improving parallel programming productivity and making parallel programming popular. (5) A speculative thread-level parallelism performance analysis model named PCL is proposed. According to the PCL model, the PTT system brings the online profiling techonology into the platform. At the same time, coordinating a variety of hardware and software mechanisms, a final evaluation and analysis on the PTT system is carried out from three levels:accuracy, effectiveness and flexibility.Based on the work of this dissertation, some important conclusions are drawn as following:(1) it's reasonable to combine the benefits of the thread-level speculation and transactional memory technology through coordination of hardware and software mechanisms. It can effectively develop the potential thread-level parallelism form the serial program while effectively reduce the difficulty of parallel programming, and greatly improve parallel programming productivity. (2) In the present multicore chips' technology roadmap that they were made by some single superscalar cores, aiming at both of the effective utilization of hardware and exploring inherent parallelism as much as possible, the desktop applications can use 2 cores computing resources efficiently while lots of multimedia and HPC applications are suitable to use 8-16 cores'computing resources. And some particularly suitable applications can use 64-128 cores'computing resources effectively. (3) It showed that although speculative thread-level parallel technology didn't perform well in the desktop applications that have serious data dependence problem, it's suitable for most multimedia and HPC applications that have large calculation, moderate thread size, and fuzzy dependence but easy to resolve. The biggest advantage of speculative thread-level parallel technology is its compatibility and easy programming, making good use of the two points, the speculative thread-level parallel technology can have an important place in the computer architecture research.All the works in this dissertation can be used to guide the designing of parallel programming model and compiler on the shared memory multicore processor architecture, to be helpful for the designing of high-performance on-chip multicore architecture, and to expose more parallelism from application with less hardware, software complexity and less hardness in parallel programming. v...
Keywords/Search Tags:multicore, transactional memory, speculative thread-level parallel, offline profiling, online profiling
PDF Full Text Request
Related items