Font Size: a A A

GPU Parallelism Lifting Method Based On Approximate Computing

Posted on:2020-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y M HaoFull Text:PDF
GTID:2428330572984272Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widespread applying of machine learning and multimedia processing applications,there are much stronger demand for computing.While computing demand is surging,CPU performance advances are slowing in post-Moore law period.GPU's blooming relieves the intension between increasing demand and slow hardware development.Big register file enables GPU to reduce context switching cost,hide memory latency and enhance thread-level parallelism.Although the compute capabilities of heterogeneous platforms such as GPGPUs grow rapidly,the increasing processing demand of big data leads to the shortage of compute capability.For example,deep learning frameworks usually need several days even weeks to train.However,GPU register file's management strategy doesn't change over last decade,and the computing demand keeps climbing.Therefore,there is tremendous pressure on GPU register file management.Machine learning and multimedia processing essentially are a great amount of floating-point arithmetic.We found that half-precision floating-point arithmetic is accurate enough to meet the accuracy requirements of many of these applications' quality of result.Approximate computing which is based on these applications' error-tolerant properties bring new challenges and opportunities to exploit potential performance benefits under current architectures.Therefore,we proposed a transparent,tractable and portable design framework for register-compression-driven approximate acceleration.We first analyze registers'lifetime and branch call graph through GPU assembly files,and selectively combine two registers' content to one.Thus,applications will spend less registers and get higher occupancy.In order to reduce proposed framework's expense,we set a lot of reasonable limits during compile stage and prune the design points which are worthless.We implemented this framework in GPGPU-SIM simulator.Experiment results indicate that compared to maxrregcount/launch_bound technique,our framework achieves performance improvement up to 1.13X while maintaining low output error(geometric mean 4.2%).
Keywords/Search Tags:GPU, Approximate Computing, Register
PDF Full Text Request
Related items