Font Size: a A A

Optimization Techniques Of Cache In Chip MultiThreading

Posted on:2008-10-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:P Y MaFull Text:PDF
GTID:1118360242999229Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
The demand of consumer is unending,so improving the performance of microprocessors is the uppermost goal for all researchers.With the fast development of semiconductor technology,more than 1 billion transistors could be integrated on one chip.How to take full advanstange of the endless recourses to implement faster and more efficient microprocessors in limited time is a challenge to microprocessor designers.The method to improve the microprocessor performance on traditional architectures by exploiting further instruction level parallelism is difficult.The first reason is the complexity of the design and the long development cycle.The second is the limited ILP in applications.CMT(Chip MultiThreading) is an efficient technique to overcome this obstacle by exploring the thread level parallelism.In addition,when we design multi-core and multi-thread architecture on a chip,we can reuse many former units.So the period of design CMT is short and the verification is easy.Nowadays, high-performance processor design is rapidly moving towards CMT architectures,so studying the key techniques of CMT in-depth will have great theoretical and practical significance.This dissertation focuses on the microarchitecture design and optimization of CMT's Cache.In order to reduce the conflict of L1P in CMT,we propose the policy of logic partition L1P Cache by n power of 2 and the competing loop lock.Now the fairness researchs always need single thread sample phase,we propose a novel fairness policy:FROCM,it doesn't need single thread sample phase.We propose ring cooperant L1 data Cache,which can reduce both the complexity of the design and the load of L2 Cache.We also propose a method to exchange threads dynamicly based on fast-shared data pool,it can detect the data consanguinity of two threads in real time and exchange them into one core rapidly.At last,we design and implement a dual-core and dual-thread VLIW prototype YHFT DSP/DS based on the above studies.In order to enhance the bandwidth of data path and reduce the delay of critical path delay of CMT processor,we design a 10R/6W register file full customly.We propose pseudo timing model of macro,this method reduces lots of work of building timing model.The primary innovative works in this paper can be summarized as follows:1) In SMT processor,every thread's executing time will be enlarged due to sharing resource by several threads.It will result in task lost in real time system.In this dissertation,we present two methods to alleviate the Cache conflict on multithread chip. One is partition L1P Cache by n power of 2,and the other is competing loop lock.Both methods only need several additional registers and shifters.Simulation results show that not only the performance of master thread is improved greatly,but also the IPC of all threads is improved 4%.2) In this dissertation,we propose an approach FROCM(Faimess Recalculate Once Cache Miss) to enhance the fairness of running multithreads in CMT processor without disturbing their running states.In order to get IPCalone dynamically,most prior work need sample phase.It will disturb other running threads and the throughput will be reduced.FROCM(Fairness Recalculate Once Cache Miss) doesn't disturb threads running states.IPCapproximately,which is the appeoximately value of thread's IPCalone,is re-calculated once Cache misses.Simulation results show that using FROCM,the system can achieve higher fairness than traditional methods.3) In this dissertation,we propose a new Cache architecture:Ring Cooperant L1 Data Cache(RCDC).In CMP,several cores accessing the shared Cache will cause memory access conflict and the problem of Cache coherence.RCDC takes full advantage of rapid exchanging data between L1Ds on one chip,and it can reduce both the complexity of the design and the load of L2 Cache.We also propose a coherency protocol M2SI according to RCDC.After detailed comparison between RCDC and MESI,simulation results show that RCDC can alleviate the load of L2 and improve the system's performance obviously.4) In order to schedule two threads with high data consanguinity to run in one core,this dissertation proposes a method to exchange threads dynamicly.It can detect the data consanguinity of two threads in real time and exchange them into one core rapidly.5) At last,we design and implement a dual-core and dual-thread VLIW prototype YHFT DSP/DS based on the above studies.We also implement a fast-shared data pool (FSDP).By logical synthesis,timing analysis and performance evaluation,it shows that DSP/DS achieves about 2 times of performance compared with YHFT DSP/800.The research in this dissertation provides a theoretical and practical solution for implementing CMT processors,and the production can be used to investigate further how to improve Cache's performance.
Keywords/Search Tags:Chip MultiThreaing, Simultaneous MultiThreading, Multi-Core DSP, Tread Level Parallel, Thread Fairness, Ring Cooperant L1 Data Cache, Dynamic Threads Exchange, Pseudo Timing Model
PDF Full Text Request
Related items