Font Size: a A A

GPU Texture Cache Pipeline Defect Diagnosis And Optimization Based On 3D Rendering

Posted on:2020-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:K Y ZhangFull Text:PDF
GTID:2428330602951898Subject:Engineering
Abstract/Summary:PDF Full Text Request
GPU is designed to accelerate graphics rendering,and it's an important component of mordern PC.In the 3D rendering application scenario,there are two mains pipelines in GPU and they are Shader Engine Pipeline and Texture Cache Pipeline.The purpose of this paper is to optimize Texture Cache Pipeline of Aruba chip in 3D rendering scenario.The research contents of this paper are mainly divided into two parts: defect diagnosis of Texture Cache Pipeline and pipeline optimization according to diagnostic results.Pipeline defect diagnosis is divided into two steps: firstly,we will analyze the performance of GPU and bottleneck module by system-level simulation model PPM,and the defect module is initially positioned.At the same time,experimental objects are selected for PVA diagnosis.Then,the module-level simulation model PVA is used for more detailed positioning and analysis.In the first step of the PPM simulation,41 typical application scenarios of 11 mainstream games are selected as the experimental objects,and simulated on three generations of different GPU models.The defect module of the pipeline is located in TCC and the following modules.In the second step of PVA simulation,we select 22 key frames with high TC Bottleneck for analysis,and find that TCR is the least efficient module in the pipeline.The two models used in defect diagnosis belong to different abstract levels.The conclusions of the two models corroborate each other and enhance the reliability of the diagnosis results.Analyzing the diagnostic results,we find that the reason for the inefficiency of TCR is that TCRs communicate with each other by ID mappin.Due to the lack of uniform instruction arbitration and allocation units between TCRs,Instruction Stall will occur in many cases.Based on the analysis results,we propose the following optimization scheme: insert a new unit named TCD between TCP and TCR,Texture Cache Distributor means the Texture Cache Instruction Distributor.TCD will accept Miss instructions from TCP,arbitrate and distribute them to the TCR with the least amount of stored data,thus reducing the Stall rate of TCR to the greatest extent.We validate the optimization scheme by PVA.The results showed that the average Stall rate of TCR decreased when TCD is inserted,while the Stall rate of other blocks increased,and the Stall rate of Texture Cache Pipeline tended to be balanced.Then we validate the optimization scheme by PPM.The performance of the Aruba chip which has not been taped out is improved by 3.18% on average at its default frequency,which meets the expected optimization requirements.The first innovation of this paper is that we don't increase cache size so that it will not affect the cost,size and yield of the cache circuit.The second innovation is that it does not change the hit rate of the cache circuit and has little disturbance to the original cache circuit and other modules.
Keywords/Search Tags:GPU, 3D Rendering Pipeline, Texture Cache Pipeline, Bottleneck, Instruction Distribution
PDF Full Text Request
Related items