The Optimization And Implementation Of X.264 Video Encoder On The Platform Of Graphics Processing Unit

Posted on:2016-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:D Jiang

Full Text:PDF

GTID:2428330473464980

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Graphics processing unit(GPU)has very high computational performance and relatively low cost.It refreshes its hardware with a speed faster than Moore's law,and has gained continuous progress in the field of General Purpose computing Graphics Processing Unit(GPGPU).The industrial community with Nnivida cooporation as the representative has produced a series of GPU with strong computational capability of float numbers and high paralalism.The launch of programm able GPU makes the universal computing with GPU become a hot research topic.H.264/AVC is a video coding standard jointly promoted by both ISO/IEC and ITU-T.It achieves excellent compression ratio and network adaptation.However,it adopts new coding feat ure tools such as variable block size motion estimation/compensation,multiple intra-prediction modes,which leads to a high computational complexity of encoder.Different from the verification model of H.264/AVC,X.264 is an open source encoder which is compatible with H.264/AVC standard.Therefore,X.264 achieves desirable performance and has relatively higher practical value.Motivated by the facts of the computation requirements of video encoder and the compuational capability of GPU,this thesis researches on the optimization and implementation of X.264 encoder on the platform of GPU.It fully exploits the strong computational capability of float numbers and high paralalism.The main works and contributions are summarized as follows:First,the structure of X.264 video encoder is analyzed,and especially its drawbacks in function hierachy,data structure hierachy and operation types.Moreover,the possibility of X.264 encoder parallism and its platform transfering are also investigated.The Intel VTune performance analyzer is utilized to conduct the practical test for the performance of X.264 encoding.By making stastics of the test results,the time consumptions of the main functions and each function module in X.264 encoder are obtained.Apparantly,this plays a solid foundation for the selection of computation-intensive and paralism-potential module in the X.264 video encoder,which is to be further investigated by GPU optimization.Second,after an analysis of the motion estimation algorithms adopte d by X.264,a parallel optimization strategy is proposed for the SAD compuatation and its comparison among different blocks in the motion estimation module.Moreover,the proposed parallel processing is implemented on the GPU platform with Compute Unified Device Architecture(CUDA).It is known that motion estimation is the most computation-intensive step for H.264/AVC encoder,which occupies more than 60% computation time.However,the motion estimation in X.264 encoder is block-based and has very high parallelism.This makes it suitable for the GPU implementation.The matching criterion,i.e,SAD computation and its comparison is implemented by GUP with single instruction and multiple thread(SIMD),and thus its acceleration is achieved.The experimental results on several typical video sequences such as Foreman show that the proposed GPU implementation can improve the efficiency 6-8 times by exploiting the parallism of full-search motion estimation.Moreover,the higher is the spatial resolution and the bi gger is searching range,the more significant acceleration results are achieved.Compared with the GPU optimization of the motion estimation module in the H.264/AVC reference software,the accelearation ratio is almost the same,but the time consumption is reduced.

Keywords/Search Tags:

H.264/AVC, X.264, Graphical Processing Unit(GPU), Compute Unified Device Architecture(CUDA), motion estimation

PDF Full Text Request

Related items

1	An Efficient Lookup Table Of FFT Parallelism Using CUDA On GPUs
2	SAR Imaging Algorithm And GPU Acceleration Research Based On Backprojection
3	The Research Of Parallel FastSLAM Algorithm Based On CUDA
4	2D-3D Medical Image Registration Technology Research Based On CUDA
5	Research On Data Stream Processing Methods Based On GPU
6	Research Of 3D Cone-beam CT Image Reconstruction Accelerating Technology
7	Research On High Performance Parallel Algorithms Based On GPU
8	Parallel Continuous Ant Colony Optimization Algorithm Based On GPU And Its Application Research
9	Study On Parallel Processing Technologies Of Photogrammetry Data Based On GPU
10	Research On High-Speed Implementation Methods Of Block Ciphers Via GPU And Bitsliced