Font Size: a A A

An Improved Algorithm And Its Parallel Optimization On GPUs For Motion Estimation Of Advanced Video Coding

Posted on:2017-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:W Z ZhuFull Text:PDF
GTID:2348330503489870Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Motion estimation is one of the key parts of advanced video coding(AVC) standard H.264. With more effective inter prediction techniques, H.264 can tremendously improve the video compression ratio, but introduces more computation complexity at the same time, which is a big challenge to real-time video coding. Recently, GPU(Graphics Processing Unit) become more and more powerful at parallel processing, and CUDA(Compute Unified Device Architecture) based on GPU is a good programming model for realization of parallelizing motion estimation. Thus, it has great significance for real-time video applications, such as network live broadcast,to deep into the parallelism of motion estimation for accelerating video coding.After optimizing motion search algorithm and analyzing motion estimation's parallelism, a parallel optimization solution based on CUDA for H.264 motion estimation is proposed. Firstly, a novel motion search algorithm TLS(Two Level Search) is proposed, which is adapted to CUDA programming model. The first level search, which is a globally coarse-grained search, is to find best motion vector with 4 step size in search window. The second level search, which is a locally fine-grained search, is to find the final best motion vector within 5×5 quare around the position of the first level search's best motion vector. This algorithm greatly reduce the search points, and can fastly get the best motion vector with GPU's strong parallel computing ability. Secondly, an asynchronous processing model is proposed, which is related to residul coding on CPU and motion estimation on GPU. A frame is partitioned into N parts, which has no data dependency on each other. When CPU is processing residul coding of the part n-1, GPU can process motion estimation of the part n, so if residul coding of the part n-1 is finished, CPU can directly process the next part without delay.Experimental results show that after applying the proposed optimization, the speed of motion estimation part using TLS could be at most 40 times faster than the x264 encoder using ESA(Exhaustive Search Algorithm), and the whole coding speed could be at most 30 times faster, with PSNR(Peak Signal to Noise Ratio) error within 1.2d B. As a conclusion, the proposed solution could greatly accelerate the video coding with an acceptable video quality lose.
Keywords/Search Tags:Advanced Video Coding, Motion Estimation, GPU, Parallel Optimization
PDF Full Text Request
Related items