Font Size: a A A

A Research On Microprocessor Performance Analytical Model

Posted on:2008-01-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:K MaFull Text:PDF
GTID:1118360212498582Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the development of IC technics and variation of applications, the complexity of microprocessor architecture design is ever increasing. How to analyze the microprocessor performance efficiently with limited design period and resources is a question for every microprocessor architect. The performance analysis method based on simulation is time-consuming and short of insight into the performance bottleneck. Architects have to simulate many kinds of workloads with different microprocessor configurations and consider the tradeoff among various factors to attain a high performance or cost-effective design. This method is a long-period process with high cost and complexity.According to the architectural characters of superscalar microprocessors, this thesis proposes a simple microprocessor analytical model applicable to micro-benchmarks. This model is based on the ideal limit formulas. Combined with the Godson-Microbench, this model can be used to analyze the pipeline efficiency and discover the performance bottleneck of microprocessors.To evaluate the microprocessor performance with macro-benchmarks, this thesis proposes MAMO, a more delicate Microprocessor Analytical MOdel. This model includes instruction window model, limited functional unit model, branch mis-prediction model, instruction and data cache miss model. MAMO can be used to compute the real performance of microprocessor based on the contribution to CPI (Cycles Per Instruction) of each microprocessor part computed by the above models.The instruction window model of MAMO can be used to model the out-of-order issue part of microprocessor. This model can compute the ideal performance based on the statistics of dynamic instruction dependency distribution. The limited functional-unit model uses the average operation latency and the Poisson distribution to model the execution part. It can be used to compute the effect on IPC (Instructions Per Cycle) of the latency and number of functional-units. The late-update branch predictor model can compute the branch mis-prediction ratio more accurately. The cache model can be used to take statistics of the data and instruction cache miss ratios and analyze the cache miss penalty based on the IW characteristic computed by the instruction window model. We use a detailed base simulator to validate MAMO which results an 8.53% error on average when predicting CPI of the SPEC CPU 2000 Int benchmarks.We analyze the performance bottleneck of the Godson-2 processor using MAMO and validate the result with the Sim-Godson simulator further. This model can also be used to exploit the design space of microprocessor, analyze the varying trend of ILP (Instruction Level Parallelism) and the effective length of waiting queues as well as the rational config of functional units.Finally, we analyze the performance of Godson-2 and Alpha21264 using the ideal-limit based model and propose some architecture improve methods such as dynamic feedback map policy for functional units. These improvements result an 13.8% increase in performance of the micro-benchmarks and 28.8% increase in IPC of the SPEC CPU 2000. As a complement of the performance analysis method based on cycle-accurate simulation, microprocessor performance analytical model can be used to analyze the performance bottleneck and exploit the design space more efficiently. This model will play an important role in the architectural optimization process of the Godson series microprocessors.
Keywords/Search Tags:superscalar, microprocessor, analytical model, performance evaluation, workload, microarchitecture, MAMO
PDF Full Text Request
Related items