Font Size: a A A

Research And Implementation Of H.264 Parallel Encoder Based On CUDA

Posted on:2011-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:H Y SuFull Text:PDF
GTID:2178330338490135Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the most popular video coding standard, H.264/AVC has attracted academia and business circles for its high image quality and high compression ratio. However, H.264 requires large and intensive computation, the existing serial encoder based on general-purpose processor can not meet the needs of real-time encoding for full HD video, while dedicated hardware encoder is also less than satisfactory with the inflexibility, long development time and high cost. Thus, it is eager to find an efficient implementation for H.264 coding. With the rapid growth of graphics processing unit (GPU), it achieves a great progress in compute capability and bandwidth, and using GPU to speeduping applocation becomes one of the hotspots. CUDA and OpenCL improve the flexible of programmable of GPU. As a result, this paper focuses on research and implementation of H.264 parallel encoder based on CUDA.Based on the streaming H.264 encoder, this paper proposes a parallel encoder which is more fit with the characteristics of CUDA framework. Unlike other H.264 encoder based on GPU, this article is not focusing on one of components of H.264 encoder, but mapping the entire H.264 encoder to the CUDA architecture. This paper design the parallel computing model and storage model for various modules of H.264 encoder based on CUDA, at the same time we optimized the encoder in various aspect.Finally, choosing 1080P video as input we evaluation the performance of the encoder propesed by this paper, experimental results show that our encoder achieves significant speedup over the reference encoder, the speedup of our encoder running on GeForce GTX260 is up to 18. Comparing with other encoders based on GPU, the performance of our encoder also excels to them. In order to evaluate the contribution of components to the entire encoder and the bottlenecks, we also assess the speedup of various parts of the encoder, results show that inter coding achives the best performance and speedup is about 25, while intra coding obtains the worse one and speedup is about 4. The time spending on data transfer between CPU and GPU occupys 25% of the whole GPU time, which is one of the bottlenecks. The parallel model for each components proposed by this paper and the analysis of performance provide insights into other applications based on GPU.
Keywords/Search Tags:H.264 encoder, CUDA, GPU, parallel encoder
PDF Full Text Request
Related items