Font Size: a A A

Research On Memory-level Parallelism For Multi-core Microprocessor Chip

Posted on:2012-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:D F LiuFull Text:PDF
GTID:1118330341451637Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The memory wall problem is one of limiting factors to improve the performance of computer, and the present of multi-core processor does not solve this problem, but brings a challenge to memory system. Computer designers have to face the question that how to reduce the performance fall of processor for memory access latency. For a long time, computer designers have devoted to improve the ILP (Instruction Level Parallelism), using computing time of processor to overlap memory access latency. However, with the increasing gap between processor and memory performance, the time processor used to compute is not long enough to overlap memory access latency, processor has to stall until the cache miss is serviced, and then the whole computing time of processor is cut to many short compute time phases. Comparing to the whole program execution time, the rate of time processor waits for memory access becomes larger. By reference on the concept of ILP, people start to think how to overlap several outstanding memory accesses, which leads the proposing of MLP (Memory-Level parallelism). MLP reduces the memory access latency and the stall time of processor for memory access by overlapping several memory accesses, which improves the performance of processor. MLP has become one of hotspot of computer architecture researches.Based on the analysis of current MLP technology, first, we construct an analysing model to study the basic characters of MLP system. Then we investigate the MLP technology from three aspects, which includes: the MLP instruction issue, the pathway of MLP memory access, and the MLP service. We enhance the ability of MLP instruction issue by the mend of CPU microarchitecture, improve the efficiency of MLP parallel pathway by management of multi-core processor shared Cache MHA(Miss Handling Architecture), and enhance the service ability of MLP parallel memory access by the optimization of memory access instruction scheduling. Major work and innovations achieved are as follows:(1) The performance analysis model of MLP systemWe separately conduct MLP system performance analysis model in micro processor and memory system. Processor performance model (MLP-CM) depicts the relation between MLP and system performance, it can effectively evaluate several system parameters including performance of MLP system, the number of used Cache MHA entries, the degree of MLP, and average memory access latency. The test results show that MLP-CM model relatively precisely predicts the system performance. Memory analysis model (MLP-MM) depicts the number relation of used entries between Multi-level Cache MHA. Take two levels Cache system as an example, we prove that the number of used entries in two levels Cache MHA is almost equal. (2) The instruction issue optimization technology in Runahead execution (ERA)Runahead execution is an effectively technology to improve the MLP. When processor stalls for the outstanding memory access, processor makes a checkpoint and enters Runahead execution stage, and pre-executes the following instruction. When the result of outstanding memory access returns, processor goes back to normal execution, which begins from the checkpoint. Runahead executes a lot of independent instruction of the memory access instruction, which increases the energy consumption of processor. To solve this problem, this thesis proposes an algorithm to reduce the ineffective instruction of Runahead execution (ERA). For floating-point programs, this algorithm reduces 30% useless instruction than common Runahead.(3) The shared Cache MHA management of multi-core processor based on MLP (MLP_Group)In the multi-core processor, all the cores share the Cache Miss Handling Architecture (MHA) for access memory. The conflict problem of shared MHA in multi-core processor affects the MLP character of thread and the fairness of threads. To solve this problem, we propose a method to manage the shared Cache MHA (MLP_Group). This method improves the thread's MLP ability, meanwhile keeps the system fairness, and moreover increases the system performance. Comparing with conventional MHA, MLP_Group improves IPC 7.1%, and improves fairness 23.6%.(4) The memory access scheduling method based on virtual channel (VC-MAS)According to the problem that MLP system has to improve the parallel service ability, we propose a SDRAM memory access scheduling algorithm based on the virtual channel. This algorithm can effectively utilize the bandwidth of memory, and improve the parallel service ability between banks.
Keywords/Search Tags:Memory-Level Parallelism, Multi-core processor, queue theory, system performance analysis, Cache Miss Handling Architecture, memory scheduling algorithm
PDF Full Text Request
Related items