GPU Parallelism Lifting Method Based On Approximate Computing

Posted on:2020-10-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Hao

Full Text:PDF

GTID:2428330572984272

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the widespread applying of machine learning and multimedia processing applications,there are much stronger demand for computing.While computing demand is surging,CPU performance advances are slowing in post-Moore law period.GPU's blooming relieves the intension between increasing demand and slow hardware development.Big register file enables GPU to reduce context switching cost,hide memory latency and enhance thread-level parallelism.Although the compute capabilities of heterogeneous platforms such as GPGPUs grow rapidly,the increasing processing demand of big data leads to the shortage of compute capability.For example,deep learning frameworks usually need several days even weeks to train.However,GPU register file's management strategy doesn't change over last decade,and the computing demand keeps climbing.Therefore,there is tremendous pressure on GPU register file management.Machine learning and multimedia processing essentially are a great amount of floating-point arithmetic.We found that half-precision floating-point arithmetic is accurate enough to meet the accuracy requirements of many of these applications' quality of result.Approximate computing which is based on these applications' error-tolerant properties bring new challenges and opportunities to exploit potential performance benefits under current architectures.Therefore,we proposed a transparent,tractable and portable design framework for register-compression-driven approximate acceleration.We first analyze registers'lifetime and branch call graph through GPU assembly files,and selectively combine two registers' content to one.Thus,applications will spend less registers and get higher occupancy.In order to reduce proposed framework's expense,we set a lot of reasonable limits during compile stage and prune the design points which are worthless.We implemented this framework in GPGPU-SIM simulator.Experiment results indicate that compared to maxrregcount/launch_bound technique,our framework achieves performance improvement up to 1.13X while maintaining low output error(geometric mean 4.2%).

Keywords/Search Tags:

GPU, Approximate Computing, Register

PDF Full Text Request

Related items

1	Research On Energy-efficient Circuit Design Technology Based On Approximate Computing
2	Research And Design Of Arithmetic Logic Unit Based On Approximate Calculation
3	An Energy Efficient JPEG Encoder With Approximate Computing Paradigm
4	A Low Power Rnn Accelerator Based On Dynamic Accuracy Approximate Computing
5	Design Of Binary Multiplier Based On Approximate Computing And Its Application In Fault-Tolerant System
6	Design And Application Of Low-Power Multiplier Based On Approximate Computing
7	Using Approximate Computing To Ensure The Performance Of Multiple Latency-sensitive Programs Running Together
8	Accuracy Configurable FFT Processor Based On Approximate Computing
9	Design And Implementation Of Low Power Multiplier Based On Approximate Computing
10	Power Optimization For FPRM Logic Using Approximate Computing Technique