Font Size: a A A

Research On Resource And Performance Optimization Strategies Of GPU

Posted on:2019-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:W G YangFull Text:PDF
GTID:2428330566984141Subject:Software engineering
Abstract/Summary:PDF Full Text Request
It is great challenge to take full advantage of GPU's computing resources and effectively improve performance due to the complex architecture and difficult programming skills.In terms of software,it is necessary to fully understand the parallel acceleration characteristics of GPUs,make effective use of various computing resources,and fully explore the potential of performance improving;In terms of the combination of hardware and software,it is necessary to make full use of specific hardware advantages to perform resource optimization and performance acceleration for specific software computing models.In terms of hardware,design optimization of the architecture is needed to improve resource scheduling strategies and reduce hardware overhead.In this article,we start from two angles,resource optimization and performance improvement,and analyze GPU optimization problems from three aspects: GPU application optimization research,neural network optimization strategy research based on INT8 quantization,and GPU cache scheduling strategy optimization research.The details are as follows.(I)To improve the performance of the algorithm,analyze the parallelism of traditional algorithm and optimize it,based on the parallel acceleration characteristics of GPU.We select the classic image thinning algorithm and improve its performance.Based on parallelism analysis,we proposed two acceleration strategies,(1)the Sliding window(SW)is used to reduce unnecessary memory transmission;(2)Templates-to-Lookup-Table(TPL2LUT)is used to solve the branch divergence problem.The experimental results show that the acceleration strategies can effectively solve the redundant copy and the conditional branching problems,and obtain an acceleration effect of 2.17 times on average.(II)Based on the hardware acceleration technology and the GPU INT8 quantization strategy,neural network inference is accelerated.In hardware,the hardware acceleration technology of NVIDIA GPU is used to accelerate the matrix multiplication.In software,we investigated the INT8 quantitative algorithm,and analyzed the advantages and disadvantages of the accuracy loss and performance improvement.Then we selected the quantitative algorithm to develop a GPU INT8 accelerator library.The experimental results show that the INT8 accelerator library achieves low precision loss,obvious acceleration effect and considerable compression rate.(III)Analyzing the GPU architecture of existing cache scheduling strategy and constantly improving the scheduling mechanism and strategy to make full use of GPU computing resources and improve performance.GPU L1 data cache contention,caused by a huge amount of concurrent threads,leads to insufficient cache utilization and poor performance,especially for cache unfriendly applications.Decoupled L1D(D-L1D)is a preventive bypassing scheme which considers the data locality of memory access streams.However,our experiments and analyses show that limited performance gain by D-L1 D is attained due to the pre-defined locality threshold.To address this issue,we propose a novel bypassing scheme named as Dynamic D-L1D(DD-L1D)that dynamically updates the locality threshold during runtime.Experimental results show that DD-L1 D outperforms D-L1 D in resource optimization and performance improvement.
Keywords/Search Tags:CUDA, Resource Optimization, Performance Optimization, INT8 Quantization, GPU Cache Scheduling Strategy
PDF Full Text Request
Related items