Font Size: a A A

Deep Learning-based Video Coding Acceleration And Intelligent Bit Allocation

Posted on:2021-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J ShiFull Text:PDF
GTID:2428330602498962Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information and multimedia technology,the application of high-definition(HD)video has been widely adopted and used.The large amount of video data in social media platforms requires highly effective video coding technology.To this end,High Efficiency Video Coding(HEVC)was developed by the Joint Collaborative Team on Video Coding(JCT-VC)in 2013.Compared with the pre-vious coding standard,H.264/AVC,HEVC can save about 50%bits with the equivalent perceptual video quality.However,the coding complexity of HEVC is much higher,which makes it really intractable for real time applications and mobile terminals.Also,rapid progress has been made in the field of computer vision and data understanding,which promotes many intelligent applications,such as surveillance video analysis and medical image understanding.Many videos need to be compressed and transmitted to be processed by such applications.Therefore,improving the coding efficiency to support these intelligent applications is very necessary.This paper focuses on the two core problems:video coding acceleration and intel-ligent bit allocation,and carries out deep research.The huge complexity of HEVC intra coding arises from the flexible size of coding unit(CU)and up to 35 intra-prediction modes.We propose the learned fast HEVC intra coding(LFHI)framework,which can accelerate the coding procedure with multi-functions.First,we design an effective and efficient asymmetric-kernel CNN(AK-CNN),which can precisely predict the coding pattern of HEVC with low complexity.Then,we introduce the new concept of minimal number of RDO candidates(MNRC)to solve the intra-mode selection problem,which allows us to reduce the complexity more safely.Next,we design an evolution-optimized threshold decision(EOTD)scheme to explore the configurable complexity-efficiency trade-offs.Finally,in order to adapt to the variant quantization parameters(QPs)in HEVC,we propose the interpolation-based prediction scheme,with which,LFHI is able to generalize to different QPs.Compared with the original HM,our approach can reduce the complexity of HEVC intra coding by 75.2%with a very negligible 2.09%BD-BR increase,which is superior to the existing fast algorithms.And it is also necessary to optimize the bit allocation scheme in video coding framework towards the semantic distortion metrics.We propose to use deep reinforce-ment learning(DRL)to solve this problem.First,we formulate the bit allocation task as a Markov Decision Process(MDP).Then we introduce DRL to provide a better bit allo-cation scheme for different vision tasks,such as classification,detection and segmenta-tion.After coding,we analyze the semantic distortions of the reconstructed frames,and then use them as the feedback information to update the agent.We also use the Grad-CAM and Mask R-CNN tools to extract the importance maps from the images/videos,which can help the agents make better decisions.Compared with the original HM,our approach can reduce 43.1%to 73.2%bits under the equivalent semantic distortions.
Keywords/Search Tags:video coding, deep learning, convolutional neural network(CNN), fast algorithm, bit allocation
PDF Full Text Request
Related items