Font Size: a A A

Deep Learning Based Video Frame Interpolation Method

Posted on:2020-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z F ZhangFull Text:PDF
GTID:2428330623963710Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technologies,video has become an indispensable multimedia data type in daily life.People not only focus on the video content,but also pay more attention to video quality.Video frame interpolation is an important video processing technology with various applications,which gains more and more academic and industrial attention.Frame interpolation attempts to synthesize one or more intermediate frames in original video sequences.Tranditional motion compensated frame interpolation methods typically involve two steps,i.e.,motion estimation between adjacent frames and pixel synthesis guided by the motion.However,the performance of these methods relies heavily on the accuracy of motion information,which is hard to be estimated in regions with occlusion,large displacement and abrupt changes in lighting.In recent years,deep learning methods have proved its remarkable performance in many computer vision problems.Reseachers have proposed neuralnetwork-based frame interpolation methods and achieved better results than traditional algorithms,while improvement is still needed to better handle complicated video sequences.Based on an end-to-end frame interpolation model,this paper proposes two algorithms to solve the large displacement problem: encoder-decoder model and multi-scale model.These two methods receive two adjacent frames and estimate motion informantion.One or more intermediate frames are then synthesized by a volume sampling layer to form an end-to-end frame interpolation model.The encoder-decoder model utilizes the encoder module to extract the high-level motion feature with the concatenation of decoder module to predict the optical flow step by tep.Then refinement module is applied to improve the accuracy of flow by refining the discontinuous area.The other method,multi-scale model,first downsamples the input frames to different resolutions.Starting from the lowest scale,residual networks are used to estimate initial flow,which is then upsampled and input to the next scale's estimation.In the end,we can get the optical flow with the same resolution as the input frames.Besides the pixel level metrics,perceptual loss is employed in the training process to improve the visual quality of interpolation results.The proposed methods combine the two steps of traditional motion compensated methods into an end-to-end model,and no optical flow ground truth is required for reference.Experiment results demonstrate that these two proposed approaches achieve the best quantitative results than other methods.Furthermore,we notice that our interpolation methods are also able to produce visually more satisfying results without artifacts and blur,especially in the regions of large displacement.
Keywords/Search Tags:frame interpolation, deep learning, encoder-decoder network, multi-scale model
PDF Full Text Request
Related items