Font Size: a A A

Performance Analysis And Branch Prediction Optimization Of A Processor Core

Posted on:2017-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:J L WuFull Text:PDF
GTID:2428330569998802Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the processor microarchitecture gets more and more complex,how to make an effective performance analysis and optimization is an urgent problem which needs to be resolved in the limited processor development period.By using the resgiter transfer level(RTL)code of a processor core as the performance model and adopting a performance analysis method based on hardware counters,the paper analyzes what affect the performance when running SPEC CPU2000 benchmarks,and optimizes the branch prediction structure according to the analysis results.The main contributions of the paper can be summarized as follows:1.A performance analysis method based on hardware counters is proposed.The accuracy and the speed play a key role in the performance analysis.The paper uses the RTL code of the processsor core as the analysis model and implements a specialized performance monitor unit(SPMU)placed outside the processor core.SPMU can use some hardware counters to collect the internal events needed by the microachitecture analysis,then pass the collected results to the result analyzer.Both of the SPMU and the RTL code of the processor core are emulated on the FPGA prototype system.As a result,a fast and accurate performacne analysis platform for the processor microarchitecture is constructed.2.The reasons which restrict the performance of the processor core when excuting the SPEC CPU2000 benchmarks are analyzed.The machine width of the processor core is four,but the average IPC(Instrcutions Per Cycle)when excecuting SPEC CPU2000 benchmarks is less than two.From the aspects of the renaming and dispatching schemes,the paper adopts the proposed performance analysis method to find the reasons which cause the renaming and dispatching blocking.The analysis results show that: the shortage of renaming registers is the major reason which causes the heavy renaming blocking for floating point benchmarks;optimizing the branch prediction structure can reduce the renaming blocking caused by no instructions in the instruction queue;and the major reason for dispatching blocking is the shortage of items in the dispatching queue.3.The branch prediction structure of the processor core is optimized.It is known that the renaming blocking can be reduced by optiming the structure of the branch prediction according to the performance analysis.In the meantime,more than 90% of mispredictions during the execution of the SPEC CPU2000 benchmarks are caused by the direction mispredictions.This paper designs and implements a more sophisticated branch prediction algorithm called TAgged GEometric history length(TAGE)branch prediction,and also verifies the function and evaluates the performance of the TAGE.Compared with the original branch prediction,the results demonstrate that the average direction misprediction rates when the processor core executes the SPEC CPU2000 integer and floating point benchmarks are reduced by 0.46% and 1.03% respectively,and the average performance of the processor core are respectively improved by 1.09%and 1.31% at the expense of 0.24% incremental hardware resources.
Keywords/Search Tags:processor core, microarchitecture, performance analysis, branch prediction, TAGE
PDF Full Text Request
Related items