Font Size: a A A

An Efficient Implementation Of H.264/AVC Encoder Based On GPU

Posted on:2013-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z W LiFull Text:PDF
GTID:2218330371457440Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the international video coding standard,H.264/AVC has attracted academia and business circles for its high image quality and high compression ratio.However,H.264 requires large and intensive computation,the existing serial encoder based on general-purpose processor can not meet the needs of real-time encoding for full HD video,while dedicated hardware encoder is also less than satisfactory with the inflexibility,long development time and high cost.Thus,it is eager to find an efficient implementation for H.264 coding.In the pure CPU platforms H.264 encoding process is extremely time-consuming, and this article analyze the coding process which exists the possibility of parallel modules from the traditional framework of H.264 encoding proceed, propose a GPU-based parallel encode device model PEM-BCUDA. According to the preliminary analysis, motion estimation and motion compensation, intra prediction, transform and quantization, loop filtering module has a large computation in the encode framework and contains parallel factor, so this paper processing unit for each separate analysis, to extract the module in the parallel model and map it to CUDA platform, and then evaluate accelerate results of each unit in the new platform. The reason to carry out system integration testing is to make the system more fit to CUDA platform, and achieve the maximum speedup of the encoding process.Finally, the paper proposed parallel encoder module and a detailed system evaluation, select Nvidia's GT240 and GTX260+ as hardware support, and choose 720p YUV image sequence as video input.. The speedup of single module is very significant, for example the speedup of loop filter is achieved more than 60 times. Considering the PCI-E bus interactive transmission capacity is weak, we carry out the system integration testing. Experimental data show that when the module is integrated into the parallel encoder model PEM-BCUDA, the accelerate effect is significantly reduced, and finally in the same environment, the speedup is only about 2.5. When we input CIF or QCIF resolution images, the speedup or even less than 1. It indicates that acceleration with GPU is completely cancelled out by data transfer on PCI-E bus, so PCI-E data transfer capability to become the bottleneck of the whole system, so how to overcome the bottleneck caused by the restrictions is the next focus of the study.
Keywords/Search Tags:GPU, H.264 Encoder, Parallel Computing
PDF Full Text Request
Related items