Font Size: a A A

Parallel Optimization Of Loop Filtering In HEVC/H.265on CUDA

Posted on:2015-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:H Y MeiFull Text:PDF
GTID:2308330452957230Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The parallel optimization of High Efficiency Video Coding (HEVC/H.265) codecalgorithm is an inevitable trend because of its huge time overhead. However, the existingparallel optimization of loop filter is not efficiency, mainly due to the parallel degree is nothigh and the most research based on multi-core platform. Furthermore, parallel studies arerarely referred to the many-core of Graphic Processing Unit (GPU) since a large numberof conditional branch load imbalance and data processing dependencies interfere with theparallelization on it.Based on the massively parallel architecture mechanism in GPU, the loop filter inHEVC is implemented according to Compute Unified Device Architecture (CUDA)programming features and its algorithmic character. In order to make the loop filter moreparallel-friendly on GPU, the proposed parallel optimization mechanism is implementedon CUDA and employs multiple optimization schemes to improve the filter efficiency.Firstly, to eliminate the abundant of condition branch operations, the instruction streamnormalization mechanism based on feature vector is introduced, which can make the filteralgorithm adapt to CUDA computing architecture and ensure its parallel efficiency.Secondly, using a parallel mechanism based on vertical and horizontal accelerationcorrection to process vertical and horizontal edges in one pass, as well as improveconcurrency and speedup factor, which produced negligible video quality loss. Finally, theprinciple of divide and conquer is applied in atomic splitting and merging scheme, andfurther memory and instruction optimization are employed to maximize performance.The proposed loop filtering parallel mechanism is evaluated on different platformswith different equipment for video sequences of resolution of1080P,1600P and2160P.The speedup factor is up to8to16compared with the multi-core parallel version and21to37compared with serial implementation in HM. The proposed mechanism is very stableon the different platforms with handling different video sequences with diverse resolutions,therefore,it is an efficient parallel optimization mechanism.
Keywords/Search Tags:in-loop filtering, deblocking filtering, GPU, CUDA, HEVC/H.265
PDF Full Text Request
Related items