Font Size: a A A

Omniscient Video Super-Resolution

Posted on:2023-07-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:P YiFull Text:PDF
GTID:1528307055981419Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High-resolution video is not always available due to factors such as hardware conditions,economic costs,and environmental conditions.Video super-resolution technology reconstructs high-resolution video by fusing the spatiotemporal complementary information of consecutive video frames,which has become an effective means to improve the spatial resolution of videos and thus improve the clarity.In recent years,video super-resolution has developed into a frontier research direction in the field of computer vision,and has important application value in video service scenarios such as video surveillance,video communication,and video satellite,etc.Deep learning has promoted the rapid development of video super-resolution,but existing research still has serious limitations,mainly including: the existing singleiteration framework does not fully utilize the temporal relationships in the video;pure explicit or implicit inter-frame alignment method is difficult to effectively capture the motion information in the video;networks with the single-channel stack structure are difficult to fully extract and fuse the spatiotemporal feature information in the video;the optimized training method based on pixel-level content loss leads to the lack of details.In response to the above problems,this dissertation conducts extensive and in-depth research and proposes corresponding solutions.The main research includes:(1)The existing video super-resolution frameworks do not fully utilize the temporal relationships contained in the video.To solve this problem,an omniscient video superresolution framework with multiple loop structures is proposed.When considering the input information source,the framework not only utilizes the past,present and future original low-resolution video frames,but also further introduces the hidden state feature information of the past,present and future moments generated in the super-resolution process,so as to more comprehensively leveraging temporal relationships in videos.The results on the PFNL-VAL dataset show that the PSNR of the omniscient framework is0.72,0.72 and 0.30 d B higher than the existing iterative,recurrent and hybrid frameworks,respectively,while the time complexity is only 25% of the comparison algorithm FFCVSR.(2)The current simple explicit or implicit inter-frame alignment method is difficult to fully capture the motion information in the video.At the same time,the existing single-channel stacking network structure does not fully integrate the spatial and temporal related feature information.To solve this problem,this dissertation proposes a progressive fusion network incorporating an explicit and implicit hybrid alignment mechanism.An explicit and implicit hybrid alignment mechanism based on explicit motion compensation and implicit similarity calculation is designed to accurately capture the complementary information between similar objects in different positions and shapes in the video.Then,a multi-channel progressive fusion network is designed,where the inputs from the video are divided into multiple channel groups,the spatial correlations within the channel and the temporal correlations between the channels are fully extracted,and the spatiotemporal information is processed in parallel and progressively.The experimental results show that the average PSNR of the algorithm is improved by more than 0.8 d B compared with the algorithm ICONVSR,and the average improvement of SSIM is more than 0.01,and the subjective visual quality is better.(3)The commonly used pixel-level content loss by the existing network training methods is prone to cause over-smoothing and blurring effects,and it is difficult to maintain the temporal stability of videos without flickering,jittering,and artifacts by simply introducing adversarial learning.To address this problem,a spatiotemporal adversarial learning method for GAN networks with mixed loss constraints is proposed.A hybrid loss function integrating adversarial loss,perceptual loss and spatiotemporal consistency loss is designed to constrain the model to generate high-fidelity videos with realistic details and temporal stability.Experimental results show that the proposed algorithm outperforms the GAN-based algorithm by about 8 points on the VMAF metric,and the subjective visual effects are more realistic,natural,and stable.In general,this dissertation establishes an omniscient video super-resolution framework,based on which a progressive fusion network combined with an explicit and implicit hybrid alignment mechanism is designed to populate the model,and finally a hybrid loss-constrained adversarial learning method is designed to optimize the model.Compared with the comparison algorithm ICONVSR,the algorithm proposed in this dissertation improves the average PSNR by more than 0.8 d B,SSIM by more than 0.01,and NIQE by about 1.5.Compared with the GAN-based super-resolution algorithm,the VMAF metric is about 8 points higher,and the subjective visual quality is more realistic and stable.
Keywords/Search Tags:Video super-resolution, Convolutional neural networks, Omniscient framework, Progressive fusion, Generative adversarial learning
PDF Full Text Request
Related items