Font Size: a A A

Research On Cache Shared Multi-Core Processor Key Technology

Posted on:2012-01-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J DuFull Text:PDF
GTID:1488303389966319Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The development of modern semiconductor technology such as Very Large Scale Integration (VLSI) is hard to satisfy the demand of microprocessors performance improvement. So the multi-core processors on new architecture replace the traditional single-core processors by exploring the thread level parallelism and become the main development tendency of high performance microprocessor. The emergence of multi-core technology enables the performance of microprocessors has been improved significantly, meanwhile it makes a higher demand on storage system design and technological advances. Cache has been the principal component of the processor and also been a key factor to decide system performance. Since the speed gap between processor and main memory has been more prominent in multi-core structure, the research that how to design the storage system structure and improve the efficiency of cache by scheduling policy has become very hot. At the same time, the research of multi-core processor simulator is also being kept in the innovation and development. Therefore, the thesis is focus on multi-core processors structure, M5 simulating system and technological optimizations of cache hierarchy.Firstly, analyze a variety of typical multi-core processor architecture and classify multi-core processors based on cache structure. Because the cache shared Chip Multi Processor (CMP) with high resource utilization, strong scalability, low energy consumption, it has been the main trend of multi-core architecture. So the thesis chooses the structure of cache shared CMP as the study target.Secondly, the problems at cache hierarchy caused by multi-core design have been discussed and analyzed. The multi-level and distributed cache structure in multi-core processor need to keep data consistency that guaranteed by bus snooping or directory-based cache coherence protocol. Then the proposals are summarized that include how to alleviate cache thrashing, fair scheduling of cache resource reference the idea of Completely Fair Scheduler (CFS), and cache replacement policies. After that, the cache memory related technology such as Extended Set-index Cache (ESC) approach, the design of Corporative Cache (CC) and Dynamic Spill-Receive (DSR) for private cache, and the new 3D-Stacked storage system are all studied an analyzed.Thirdly, based on the research of multi-core architecture and storage system structure, a new performance evaluation model Executing and Transport Analysis Model (ETAM) has been proposed. ETAM is an Executing-Transport-based model using the queue theory for processor memory hierarchy analysis. Added the cross-access and queue delay factors, it can be used to evaluate multi-core processor performance with different memory sharing levels. The preliminary performance estimation can help to adjust the memory hierarchy design. More importantly, a CMP experimental simulation platform has been successfully established under Linux environment by using the latest M5 simulating system. The prototype of CMPs core is Alpha21264. The cache replacement polices including pseudo LRU, FIFO and random election are all implemented by modifying the M5 resource code. Then, use SPEC CPU2006 benchmarks to test the simulating speed and performance of M5 system. The experimental results show that the modular M5 simulating system has the benefits such as flexibility, high speed and easy to use.Fourthly, the Dead Block phenomena and the conflict contamination at cache level are discussed. Then, a new shared cache management policy Homologous Promotion & Insertion Policy (HPIP) is proposed. Different to Least Recently Reused Insertion Policy (LIP) and Bimodal Insertion Policy (BIP), HPIP uses dynamic insertion approach. When a data block need to be installed, it always chooses the lowest priority position as victim and calculates the insertion position according to which core the block belongs to. That means different core's block will be inserted at different position. The dynamic insertion makes the threads on different cores run in pseudo independently, that try to reduce the conflict pollution and the time that Dead Blocks remain in cache. Furthermore, HPIP employ the linear or exponent calculation to decide the promotion on a hit, through that make the logical priority more accurately reflect the hit situations rather than the access situations. The experiments on M5 simulating system show that HPIP improves average throughput, weighted speedup and harmonic fairness metric in some degree compared to the traditional LRU, it also reduces L2 cache miss rate.Finally, to explore the adaptive policy selection mechanism for shared cache, a new scheme Adaptive Policy Election (APE) has been proposed. Since different applications as well as different phases of the same application have different cache demand characteristics, APE use the Policy Selection (PSEL) counter to select between two component policies. Compared with Dynamic Insertion Policy (DIP) mechanism that monitors overall cache performance as to be unaware of the characteristics of individual applications, Thread-Aware DIP (TADIP) mechanism faces the difficulty of hardware implementation issues. So APE first time introduces the fixed number of a processor cores as the indicator for design of sampling and monitoring. While the logical structure of MRM (Miss Rate Monitor) has been designed, how many Set Dueling (SD) should be selected and how to choose are all answered. APE not only adjusts the grain size of monitoring, but also reduces the cost of the hardware design and implementation. The experiments on M5 simulating system show that although the adaptive policy selection mechanisms make the CMPs performance metrics in a corresponding increase, simultaneously the harmonic fairness sacrificed. Further research and work should be carried on.
Keywords/Search Tags:Multi-core processor, Cache shared CMP, M5 simulating system, Cache scheduling policy, Adaptive mechanism
PDF Full Text Request
Related items