Parallel Optimization Of Loop Filtering In HEVC/H.265on CUDA

Posted on:2015-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Mei

Full Text:PDF

GTID:2308330452957230

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The parallel optimization of High Efficiency Video Coding (HEVC/H.265) codecalgorithm is an inevitable trend because of its huge time overhead. However, the existingparallel optimization of loop filter is not efficiency, mainly due to the parallel degree is nothigh and the most research based on multi-core platform. Furthermore, parallel studies arerarely referred to the many-core of Graphic Processing Unit (GPU) since a large numberof conditional branch load imbalance and data processing dependencies interfere with theparallelization on it.Based on the massively parallel architecture mechanism in GPU, the loop filter inHEVC is implemented according to Compute Unified Device Architecture (CUDA)programming features and its algorithmic character. In order to make the loop filter moreparallel-friendly on GPU, the proposed parallel optimization mechanism is implementedon CUDA and employs multiple optimization schemes to improve the filter efficiency.Firstly, to eliminate the abundant of condition branch operations, the instruction streamnormalization mechanism based on feature vector is introduced, which can make the filteralgorithm adapt to CUDA computing architecture and ensure its parallel efficiency.Secondly, using a parallel mechanism based on vertical and horizontal accelerationcorrection to process vertical and horizontal edges in one pass, as well as improveconcurrency and speedup factor, which produced negligible video quality loss. Finally, theprinciple of divide and conquer is applied in atomic splitting and merging scheme, andfurther memory and instruction optimization are employed to maximize performance.The proposed loop filtering parallel mechanism is evaluated on different platformswith different equipment for video sequences of resolution of1080P,1600P and2160P.The speedup factor is up to8to16compared with the multi-core parallel version and21to37compared with serial implementation in HM. The proposed mechanism is very stableon the different platforms with handling different video sequences with diverse resolutions,therefore，it is an efficient parallel optimization mechanism.

Keywords/Search Tags:

in-loop filtering, deblocking filtering, GPU, CUDA, HEVC/H.265

PDF Full Text Request

Related items

1	Optimization Of Loop-filtering Algorithms On HEVC Video Coding Standard
2	Efficient Video Coding Loop Filtering Technology Optimization
3	Research On Algorithm Of HEVC Loop Deblocking Filter
4	The Research On Compressed Noise Modeling And Deblocking Algorithm
5	Research On Bit Allocation And Loop Filtering Optimization For High Efficiency Video Coding
6	Research On In-loop Deblocking Filter For High Efficiency Video Coding
7	Research Of Image Registration And Filtering Method Using CUDA
8	Improvement Of The Motion Compensation Temporal Filtering
9	The Research On Image Filtering Technology Based On CUDA
10	Design And Implementation Of Parallel Algorithms For Key Modules In HEVC Based On GPU