Deep Learning-Based Inter-Frame Prediction For Video Coding

Posted on:2022-02-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J P Lin

Full Text:PDF

GTID:1488306323462854

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rise of multimedia applications and the increasing demand for higher-quality video services,the amount of video data in the world has been increasing ex-plosively.In the past year and a half,because of the outbreak of the COVID-19,people have been forced to transfer most of their communication to the Internet,and thus the growth of video traffic has further accelerated.How to efficiently compress these fast-growing video data has become the most severe challenge for multimedia computing,transmission and storage.One of the key solutions is the video coding technology that compresses the video to a smaller size.Existing video coding standards(including High Efficiency Video Coding H.265/HEVC and the new generation of Versatile Video Cod-ing H.266/VVC,etc.)have been using the block-based hybrid coding framework,where most of modules are manually designed and independently optimized.After more than 30 years of incremental development,the framework has gradually become saturated It is difficult for a single traditional technology to achieve obvious coding gain under the constraint of computational complexity.In recent years,deep learning has been successfully applied in low-level vision fields such as video frame interpolation and image/video super-resolution.For exam-ple,the image/video super-resolution methods based on convolutional neural network has far exceeded the traditional interpolation-based methods in reconstruction effect This dissertation studies how to apply these advanced technologies to the field of video coding to further improve the performance of video inter-frame coding.Since natural video is highly correlated in the temporal dimension,the inter-frame coding technology using this correlation is a key factor in determining the compression performance of the entire video coding system.However,the inter-frame coding in the traditional framework has the following three shortcomings.First,the traditional frame-work uses block-based translation or affine motion models for motion estimation and compensation.Although it has the advantage of high computational efficiency,it cannot effectively characterize the complex motion in natural video.Second,all inter-frame coding blocks in the traditional framework are only coded at the original resolution.When the actual available bit rate is too low to effectively express the original signal,it will cause severe coding distortion.As the video resolution increases,this distortion will becomes more and more severe.Finally,since the traditional framework is a com-plex coding system designed manually,the inter-frame coding module based on deep learning designed for it cannot directly optimize the rate-distortion objective function end-to-end.Instead,the neural network can only be trained by using heuristic meth-ods to introduce the rate and distortion constraint,which usually leads to a suboptimal coding performance.The three research contents of this dissertation are designed for the three short-comings of inter-frame coding in the traditional framework.The main technological innovations and contributions of this dissertation are as follows:(1)Aiming at the problem that the motion model in the traditional framework is too simple,this dissertation proposes a temporal-domain extrapolation technique based on deep learning and multiple reference frames to compensate for the complex high-order motion in the video.Specifically,cosidering the motion between frames may have different magnitudes,this dissertation designs a multi-scale extrapolation network for progressive prediction,and constructs a suitable data set for network training.In ad-dition,as the motion has a certain degree of randomness,the extrapolated frame cannot always be aligned with the current frame.This dissertation puts the extrapolated frame into the HEVC reference frame list as an additional reference frame.Experimental re-sults show that the temporal extrapolation technique proposed in this dissertation can significantly improve the performance of video coding.(2)Aiming at the problem that all inter-frame coding blocks in the traditional framework are only coded at the original resolution,this dissertation proposes an inter-frame coding technology based on deep learning and block-level down/up sampling to achieve locally adaptive-resolution inter-frame coding.Specifically,this dissertation provides the flexibility of selecting low-resolution or high-resolution coding and select-ing up-sampling based on traditional interpolation filters or up-sampling based on con-volutional neural network for each coding block in HEVC P or B frames.In addition,this dissertation further studies how to use reference frames to improve the upsampling ability of convolutional networks,and studies how to use compressed video sequences to train up-sampling networks.Experimental results show that the proposed method can provide more than 5%coding gain for high-definition and ultrahigh-definition se-quences encoded at low bitrates.(3)Aiming at the problem that the inter-frame coding module based on deep learn-ing in the traditional framework cannot be optimized end-to-end,this dissertation pro-poses an end-to-end learned video coding technology based on deep learning and mul-tiple references prediction.Specifically,this dissertation introduces two new modules based on multiple references prediction in motion vector domain and pixel domain re-spectively,to improve the prediction accuracy and enhance the reconstruction quality.The motion vector domain includes the modules of motion vector prediction and mo-tion vector refinement.The pixel domain includes the modules of motion compensa-tion and residual refinement.In addition,in order to solve the problem that the com-plex multi-module framework are difficult to optimize,this dissertation further designs a progressive training strategy to directly end-to-end optimize the entire network un-der the rate-distortion objective function.Experimental results show that the proposed method greatly exceeds the previous end-to-end learned video coding methods in terms of coding performance.

Keywords/Search Tags:

High Efficiency Video Coding, inter-frame prediction, inter-frame coding, deep learning, temporal extrapolation, block-level down-sampling coding, end-to-end learned video compression

PDF Full Text Request

Related items

1	Research On High Efficiency Inter Coding In Video Compression
2	Research On Fast Inter Frame Algorithms For High Efficiency Video Coding
3	Research On The Inter Prediction Algorithm For HEVC Based On Neural Networks
4	Research On Fast Algorithm For Scalable High Efficient Video Coding SHVC
5	MPEG-4 "Block DCT+Shape Adaptive DCT" Algorithm Research And Its FPGA Hardware Implementing
6	Research And Optimization Of Inter-frame Coding For H.264 Video Coding Standard
7	A Fast Inter-frame Coding Algorithm For New Generation Of Video Coding
8	Research On Inter Prediction Algorithm Of High Efficient Video Coding
9	Research On Joint Optimization Of Video Coding Reference Frames Based On Deep Learning
10	Research On Screen Content Video Coding Based On Inter Frame Correlation