Font Size: a A A

Parallel Implementation Of H.264/AVC Video Codec On The CUDA Platform

Posted on:2012-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:X L HuFull Text:PDF
GTID:2178330335460449Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, communication technology, storage technology and processor technology has rapidly developed. Methods of communication have developed into video and audio. Correspondingly people are increasingly demanding high quality of image. H.264 video coding standard is a next-generation international video coding standard, which adopts a series of advanced technologies including intra, inter prediction, integer transform and quantization, entropy coding and loop-filter. H.264 ensures the accuracy of prediction and improves the decoded image quality, but it also increases the time complexity.The apperance of multi-core CPU and GPU means that chips of mainstream processor has entered a parallel era and the parallelism will continue expanding, which requires developing application software that can transparently extend parallelism. CUDA came into being in this background. CUDA is a parallel programming model and software environment, dedicated to the solution of problems that can be expressed as data parallel calculation, such as programs having parallel execution on many data elements and with a high calculation density. Therefore how to use CUDA multi-thread programming to reduce the time complexity of H.264 and improve the coding efficiency is the focus of this thesis.Loop-filter algorithm of JM16.0 of H.264 video coding standard was studied in this thesis. On the base of discovering that the characteristics of loop-filter algorithm including complexity and regular computation meet the conditions of CUDA application after researching on the method for edge filter strength calculation and analysing the filtering order of first vertical filter and then horizontal filter, parallelly designing and implementing the two steps of strength calculation and luma filter is proposed in this article. During the designing progress, it is important to define the work done by the thread and organize the threads according to the algorithm to meet the need of realistic image format better and also to demonstrate the advantage of parallelism on CUDA.Inter prediction is another key technology of H.264. Algorithms of motion estimation such as full search, fractional pixel search and fast search used in inter prediction of JM86 are studied and compared in matching result and time complexity. Full search algorith m has best matches, but it has maximum computation and strong regularity. To reduce the time complexity of the algorithm, parallel design of full search on CUDA is implemented in this thesis. The design focuses on achieving maximum parallelism of full search, porting SAD calculation for the macroblocks of one frame to CUDA platform of GPU and computing parallely, so that the load on CPU can be reduced, which allows CPU to handle other responses.Linux operating system and GTX260+ graphics card are chosen as the environment for implementing parallel algorithm to abtain more precise encoding time and high parallelism. The above two parallelly designed algorithms was implemented in this environment, and then tested by test cases of different size. The results indicate that the algorithms of loop-filter and inter prediction are optimized through parallel design in this thesis.
Keywords/Search Tags:H.264, CUDA, inter prediction, loop-filter
PDF Full Text Request
Related items