Font Size: a A A

Video Conference Decoder Based On CPU-GPU Parallel Architecture

Posted on:2019-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:L LeiFull Text:PDF
GTID:2348330569988494Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,the real-time decoding of multiple HD video streams in video conferences has become a hot topic of research.In the software video conference system,the traditional CPU decoding algorithm has been difficult to meet people's requirements for high real-time video conferencing.Due to limitations of hardware resource,in the case of large-scale high-definition(1080P)conferences take place at the same time,the traditional CPU decoding algorithm would be lack of memory and processing resources.Based on the analysis of existing CPU decode algorithms,this paper is focused on how to use the GPU to speed up the H.264 decoding algorithm,reduces system resources,support for more streams,aiming at designing faster and better video conference decoder.The main work of this paper is as follows:Firstly,based on the research of the time-consuming loop filtering module in H.264 standard,this paper analyzes the processing order and correlation between each pixel in the process of loop filtering,calculates the parallelism and conflict possibility between filtering threads,proposes a GPU decoding algorithm for parallel deblocking filtering which is adapted to the GPU operation by changing the value of parts of the filter strength and calculating the intermediate variables in advance.Experiments show that this algorithm can effectively improve the loop filtering speed,while there is almost no loss in the quality of the video frame.Secondly,based on the research of 2d-wave decoding algorithm which is used by multi-core computer and 2D-wave-GPU algorithm which is used by GPGPU,this paper focuses on the problem that the large data exchange time between CPU and GPU in the algorithm and the single macroblock only adopts a single thread.Thus,a CUDA-based pipeline structure is proposed to cover the time-consuming data transfer between the GPU and the CPU,Through making CPU and the GPU work together to reduce the CPU decoding resources and achieve concurrent between CPU and GPU processing.In different decoding modules,multiple threads are used to decode a single macroblock.Experiments show that the decoding speed of this algorithm is significantly improved compared to the original2d-wave-GPU,in addition,It also has some advantages over FFMPEG and its own structure has significant guidance for the design of 3d-wave-GPU algorithm.Based on the 3d-wave decoding algorithm which used by multi-core computer and the improved 2d-wave-GPU decoding algorithm,this paper analyzes intra-frame and inter-frame macroblock correlation and proposes a parallel decoding of multi-frame video sequences inthe time dimension.The 3d-wave-GPU decoding algorithm,which uses the characteristics of intra-frame macroblocks only in relation to the reference region corresponding to the motion vector and the adjacent macroblocks in the decoding process,to achieves GPU parallel decoding of video sequences at the same time by reasonably avoiding the conflict of the decoding threads,using GPU high concurrency features to quickly process high parallel granularity modules in the H264 decoding algorithm,mapping the macroblocks in the decoding protocol to the Blocks in the CUDA programming model.Experiments have shown that The decoder effectively improves the throughput of the conference video and the decoding speed is also improved relative to the serial algorithm.
Keywords/Search Tags:H.264 decode, CUDA, GPU, Video conference, deblocking filter, 3d-wave, 2d-wave
PDF Full Text Request
Related items