Font Size: a A A

Research On Hardware Transactional Memory Microarchitecture And Emulation

Posted on:2013-04-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:1228330395473750Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the continuous scaling of CMOS technologies, the chip performance cannot improve alone with the increase in the number of transistors and clock frequency of single processor. The chip multi-processor (CMP) has become the first choice for the microprocessor architecture design, due to its powerful thread level parallel processing capability, better resource utilization efficiency and great scalability. The alterations in data transfer between processor cores and storage hierarchical structure have a great effect upon the performance of CMP and parallel programming model. Therefore, the shared memory CMP architecture and thread parallel programming mechanism have been the significant issues to enhance the performance and efficiency of CMP. Transactional memory (TM) is proposed to simplify parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. The programmer will concern about which program segment to be executed in atomicity, rather than how to achieve the atomicity. In view of the development trends and challenges of hardware transactional memory (HTM) design, we focus on research about HTM structure for processor architecture, the memory hierarchy to support cache coherency and HTM, and the emulation platform for the CMP design and architectural exploration.Considering the development of HTM has entered logic implementation, this paper presents a kind of method to design hardware transactional memory based on embedded RISC processor. In order to explore the transactional memory architecture how to impact on the processor micro-architecture and the critical path of pipeline, the modular structure design method is used to integrate with processor pipeline and memory units, such as register file and data cache. The design is achieved by modifying and extending the processor load/store unit, pipeline control unit and instruction decode unit. Moreover, the transactional instructions are implemented to support programming with TM. The logic synthesis results show that our HTM structure design dose not reduce the frequency of processor, but increases21%area and18%power overhead of the embedded processor respectively. Our design method provides a solution for HTM logic implementation based on processor architecture.Secondly, a TMESI directory protocol is proposed to support cache coherency and HTM, which can improve the CMP applicability by allowing programmer to employ lock synchronization and TM mechanism either. The associated status bits are added in the data cache tag and load/store pipeline control unit is modified to implement the TMESI protocol on embedded processor architecture. Utilizing the networks-on-chip (NoC) for interconnection, we construct8-core shared memory architecture to evaluate TMESI protocol. The workloads include the ticket booking program and the scientific kernel applications. The experimental results show that the TM mechanism with TMESI protocol can improve performance by1%~17%compared to ordinary TM. For the ticket booking program, in which data dependency is uncertain between tasks, TM has better performance by taking advantage of speculative execution, while fine-grained lock can obtain up to14%performance increases for the scientific kernel applications with specific data dependency.Finally, a high speed, scalable, and flexible multi-FPGA-based emulation platform is designed to support the CMP architectural exploration and logic verification. The NoC-based multi-core system to be emulated can be partitioned into two parts:the processing cores and the network. Each part is mapped onto a different FPGA so that the parallel on-board wires can implement interconnections between the processing cores and their routers directly. It avoids modifying the target system architecture and decreasing the speed on majority of existing multi-FPGA emulation platform for using wire multiplexing technique between FPGAs. Multiple emulation boards can be interconnected through FPGAs" high speed serial links that would allow the system to be scaled up to a much larger size. Various CMP architectures, including shared memory multiprocessors and distributed memory multiprocessors, are emulated on the platform. Compared with software simulation, our FPGA-based emulation platform achieves more than104magnitude speedup, while the embedded processor is running at up to108MHz. The platform supports not only hardware logic verification, but also software development for CMP system to achieve good design space exploration capability.
Keywords/Search Tags:chip multi-processor, hardware transactional memory, cache coherency, emulation, embedded processor
PDF Full Text Request
Related items