Font Size: a A A

Design And Implementation Of Parallel Algorithms For Key Modules In HEVC Based On GPU

Posted on:2017-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:W L ZhangFull Text:PDF
GTID:2348330488459846Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Recently, with the development of high definition (HD) and ultra high definition (UHD) videos, the tendency of diversity and high definition of video applications brings up great challenge to existing video coding standards. In order to satisfy the requirement of video applications, ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG) formed a Joint Collaborative Team and developed a new generation video coding standard in 2013, known as High Efficiency Video Coding (HEVC). Compared with H.264/AVC, HEVC can obtain 50% bit rate reduction under the same visual quality and provide the superior coding performance. At the same time, the ultrahigh computational complexity of HEVC also brings some effects for widely video applications. So it is necessary to improve the encoding efficiency of HEVC.GPU is applied to general parallel computing widely due to its lots of computing units. NVIDIA released a new Computer Unified Device Architecture-CUDA in 2007 and CUDA provides a good computational platform for extensive parallel computing due to its C-like language. Parallel processing video encoding algorithms by fully exploiting GPU many-core computing ability has been an effective direction for accelerating the video encoding.In this paper, we design the corresponding parallel algorithms focus on the key modules of inter prediction and loop-filter in HEVC and implement the parallel algorithms through CUDA. An effective parallel algorithm of fractional-pixel interpolation is designed. A ladder-like parallel algorithm is proposed to implement the integral and fractional-pixel motion estimation; For DCT and IDCT, a multi-level parallel scheme is designed based on butterfly algorithm; For quantization and inverse quantization, the parallel algorithm is designed respectively; A full parallel algorithm for deblocking filter is designed in this paper; for EO and BO in SAO, we design the corresponding parallel algorithms of sample classification, statistics collection and computing the offset values, respectively. We design a diagonal parallel scheme for SAO merging and full parallel algorithm for SAO filtering.The designed parallel algorithms are implemented with CUDA based on CPU+GPU platform. Experimental results given in the paper show that, for 1080P sequences, the parallel method can greatly improve the encoding efficiency of whole inter-loop with more than 20 times speedup under the same visual quality compared with the original serial algorithm.
Keywords/Search Tags:HEVC, inter prediction, loop-filter, parallel algorithm, GPU, CUDA
PDF Full Text Request
Related items