Research On Resource And Performance Optimization Strategies Of GPU

Posted on:2019-10-18

Degree:Master

Type:Thesis

Country:China

Candidate:W G Yang

Full Text:PDF

GTID:2428330566984141

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

It is great challenge to take full advantage of GPU's computing resources and effectively improve performance due to the complex architecture and difficult programming skills.In terms of software,it is necessary to fully understand the parallel acceleration characteristics of GPUs,make effective use of various computing resources,and fully explore the potential of performance improving;In terms of the combination of hardware and software,it is necessary to make full use of specific hardware advantages to perform resource optimization and performance acceleration for specific software computing models.In terms of hardware,design optimization of the architecture is needed to improve resource scheduling strategies and reduce hardware overhead.In this article,we start from two angles,resource optimization and performance improvement,and analyze GPU optimization problems from three aspects: GPU application optimization research,neural network optimization strategy research based on INT8 quantization,and GPU cache scheduling strategy optimization research.The details are as follows.(I)To improve the performance of the algorithm,analyze the parallelism of traditional algorithm and optimize it,based on the parallel acceleration characteristics of GPU.We select the classic image thinning algorithm and improve its performance.Based on parallelism analysis,we proposed two acceleration strategies,(1)the Sliding window(SW)is used to reduce unnecessary memory transmission;(2)Templates-to-Lookup-Table(TPL2LUT)is used to solve the branch divergence problem.The experimental results show that the acceleration strategies can effectively solve the redundant copy and the conditional branching problems,and obtain an acceleration effect of 2.17 times on average.(II)Based on the hardware acceleration technology and the GPU INT8 quantization strategy,neural network inference is accelerated.In hardware,the hardware acceleration technology of NVIDIA GPU is used to accelerate the matrix multiplication.In software,we investigated the INT8 quantitative algorithm,and analyzed the advantages and disadvantages of the accuracy loss and performance improvement.Then we selected the quantitative algorithm to develop a GPU INT8 accelerator library.The experimental results show that the INT8 accelerator library achieves low precision loss,obvious acceleration effect and considerable compression rate.(III)Analyzing the GPU architecture of existing cache scheduling strategy and constantly improving the scheduling mechanism and strategy to make full use of GPU computing resources and improve performance.GPU L1 data cache contention,caused by a huge amount of concurrent threads,leads to insufficient cache utilization and poor performance,especially for cache unfriendly applications.Decoupled L1D(D-L1D)is a preventive bypassing scheme which considers the data locality of memory access streams.However,our experiments and analyses show that limited performance gain by D-L1 D is attained due to the pre-defined locality threshold.To address this issue,we propose a novel bypassing scheme named as Dynamic D-L1D(DD-L1D)that dynamically updates the locality threshold during runtime.Experimental results show that DD-L1 D outperforms D-L1 D in resource optimization and performance improvement.

Keywords/Search Tags:

CUDA, Resource Optimization, Performance Optimization, INT8 Quantization, GPU Cache Scheduling Strategy

PDF Full Text Request

Related items

1	Research Of Thread Placement Optimization Strategy For CUDA Programs
2	Research On Optimization Of CDN Caching Strategy Based On Machine Learning
3	Research On Scheduling Methods Of Massive Parallel Processors For Resource And Performance Optimization
4	Research On Resource Scheduling Algorithms Based On The Multiobjective Optimization In Cloud Computing
5	Research And Implementation Of Spark Performance Optimization For Police Data Processing
6	Xen Virtual Machine Scheduling Optimization Based On Cache-relevace
7	Energy Saving Scheduling Scheme In Cache-based Wireless Network Under High Speed Scenario
8	The Research On Cache Strategy In Multiple-cloud Storage System
9	The Research And Implementation Of Performance Optimization Technique On Java EE Gateway Website
10	Research On Cache Performance Optimization Based On Energy Consumption In CCN