Font Size: a A A

Deep Learning-Based Inter-Frame Prediction For Video Coding

Posted on:2022-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J P LinFull Text:PDF
GTID:1488306323462854Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rise of multimedia applications and the increasing demand for higher-quality video services,the amount of video data in the world has been increasing ex-plosively.In the past year and a half,because of the outbreak of the COVID-19,people have been forced to transfer most of their communication to the Internet,and thus the growth of video traffic has further accelerated.How to efficiently compress these fast-growing video data has become the most severe challenge for multimedia computing,transmission and storage.One of the key solutions is the video coding technology that compresses the video to a smaller size.Existing video coding standards(including High Efficiency Video Coding H.265/HEVC and the new generation of Versatile Video Cod-ing H.266/VVC,etc.)have been using the block-based hybrid coding framework,where most of modules are manually designed and independently optimized.After more than 30 years of incremental development,the framework has gradually become saturated It is difficult for a single traditional technology to achieve obvious coding gain under the constraint of computational complexity.In recent years,deep learning has been successfully applied in low-level vision fields such as video frame interpolation and image/video super-resolution.For exam-ple,the image/video super-resolution methods based on convolutional neural network has far exceeded the traditional interpolation-based methods in reconstruction effect This dissertation studies how to apply these advanced technologies to the field of video coding to further improve the performance of video inter-frame coding.Since natural video is highly correlated in the temporal dimension,the inter-frame coding technology using this correlation is a key factor in determining the compression performance of the entire video coding system.However,the inter-frame coding in the traditional framework has the following three shortcomings.First,the traditional frame-work uses block-based translation or affine motion models for motion estimation and compensation.Although it has the advantage of high computational efficiency,it cannot effectively characterize the complex motion in natural video.Second,all inter-frame coding blocks in the traditional framework are only coded at the original resolution.When the actual available bit rate is too low to effectively express the original signal,it will cause severe coding distortion.As the video resolution increases,this distortion will becomes more and more severe.Finally,since the traditional framework is a com-plex coding system designed manually,the inter-frame coding module based on deep learning designed for it cannot directly optimize the rate-distortion objective function end-to-end.Instead,the neural network can only be trained by using heuristic meth-ods to introduce the rate and distortion constraint,which usually leads to a suboptimal coding performance.The three research contents of this dissertation are designed for the three short-comings of inter-frame coding in the traditional framework.The main technological innovations and contributions of this dissertation are as follows:(1)Aiming at the problem that the motion model in the traditional framework is too simple,this dissertation proposes a temporal-domain extrapolation technique based on deep learning and multiple reference frames to compensate for the complex high-order motion in the video.Specifically,cosidering the motion between frames may have different magnitudes,this dissertation designs a multi-scale extrapolation network for progressive prediction,and constructs a suitable data set for network training.In ad-dition,as the motion has a certain degree of randomness,the extrapolated frame cannot always be aligned with the current frame.This dissertation puts the extrapolated frame into the HEVC reference frame list as an additional reference frame.Experimental re-sults show that the temporal extrapolation technique proposed in this dissertation can significantly improve the performance of video coding.(2)Aiming at the problem that all inter-frame coding blocks in the traditional framework are only coded at the original resolution,this dissertation proposes an inter-frame coding technology based on deep learning and block-level down/up sampling to achieve locally adaptive-resolution inter-frame coding.Specifically,this dissertation provides the flexibility of selecting low-resolution or high-resolution coding and select-ing up-sampling based on traditional interpolation filters or up-sampling based on con-volutional neural network for each coding block in HEVC P or B frames.In addition,this dissertation further studies how to use reference frames to improve the upsampling ability of convolutional networks,and studies how to use compressed video sequences to train up-sampling networks.Experimental results show that the proposed method can provide more than 5%coding gain for high-definition and ultrahigh-definition se-quences encoded at low bitrates.(3)Aiming at the problem that the inter-frame coding module based on deep learn-ing in the traditional framework cannot be optimized end-to-end,this dissertation pro-poses an end-to-end learned video coding technology based on deep learning and mul-tiple references prediction.Specifically,this dissertation introduces two new modules based on multiple references prediction in motion vector domain and pixel domain re-spectively,to improve the prediction accuracy and enhance the reconstruction quality.The motion vector domain includes the modules of motion vector prediction and mo-tion vector refinement.The pixel domain includes the modules of motion compensa-tion and residual refinement.In addition,in order to solve the problem that the com-plex multi-module framework are difficult to optimize,this dissertation further designs a progressive training strategy to directly end-to-end optimize the entire network un-der the rate-distortion objective function.Experimental results show that the proposed method greatly exceeds the previous end-to-end learned video coding methods in terms of coding performance.
Keywords/Search Tags:High Efficiency Video Coding, inter-frame prediction, inter-frame coding, deep learning, temporal extrapolation, block-level down-sampling coding, end-to-end learned video compression
PDF Full Text Request
Related items