Font Size: a A A

Research And Implementation Of Video Super-resolution Deep Learning Network Based On Multi-frame Information Fusion

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:X N ZhuFull Text:PDF
GTID:2438330602998347Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of multimedia technology,people's demand for visual clarity also rises accordingly.On the one hand,high-resolution image will increase the cost of image acquisition,storage and transmission.On the other hand,it will also be troubled by low SNR(Signal to Noise Ratio).Therefore,super-resolution technology came into being.Super-resolution reconstruction technology mainly includes three types : 1.technology based on traditional algorithms,it is implemented by interpolation technique generally.Although the image or video is mapped to the high resolution space,it is still blurred visually.2.Super-resolution reconstruction based on machine learning improves image resolution by constructing regression model.Commonly used modules include decision tree,support vector machine,etc.3.The super-resolution reconstruction technology based on deep learning takes advantage of the characteristics of adaptive learning of convolutional neural network and trains the model repeatedly by using a large number of datasets,it can automatically find the feature mapping relationship hidden in the same image in the high and low resolution space,so as to realize the feature reconstruction of the image or video.Most of the existing video super-resolution reconstruction methods utilize optical flow,which makes the quality of video frame reconstruction completely depend on the prediction accuracy of optical flow.If the prediction accuracy of optical flow is low,the reconstructed video frame may be discontinuous or generate artifacts.So the paper proposes a video super-resolution deep learning framework based on full convolution network.In the experiment,the adaptive alignment of video frames in the feature layer is used to replace the motion compensation of optical flow.The main innovations of this model are as follows: 1.In order to make full use of the information between video frames,a network structure combining layered convolution and multi-frame fusion is proposed.For the continuous frames input into the model,different video frames are firstly convolved hierarchically,and then theprocessed video frames are fused.On the one hand,the continuity between adjacent frames is guaranteed,and on the other hand,specific details are provided for image reconstruction.2.In order to optimize the network model,residual network is introduced into the model.By introducing residual network,the model can be further optimize the model and improve training efficiency on the premise of ensuring the extraction of deep-level features.3.In order to improve the quality of video super-resolution reconstruction,the mixed loss function combining mean square loss and perceived loss is adopted as the optimization objective.For the traditional loss function,it can only calculate and optimize the model in pixels,which often leads to the weak generalization ability of the model,but perceptual loss is to directly carry out global semantic perception on the image so as to find the detailed features of visual level contained in the image.but it will also ignore the differences at the pixel level.Therefore,by combining the traditional loss with the perceptual loss,not only can the feature information at the pixel level not be lost,but also can improve the visual reconstruction quality of the experimental results.The model of this project is built and operated in the Pytorch environment.The version of Pytorch is 1.0.1,and the datasets is 1920 * 1080 HD videos that we downloaded on You Tube.In order to ensure the practicability of the model,our data set not only contains people,animals,plants,but also various landscapes and buildings.In the experiment,we processed the video into 30,000 continuous video frames as a training set,and compared it with ESPCN,VESPCN and other super-resolution reconstruction technologies on the same data set.The results show that this model is the clearest.Therefore,the experiment proves that the layered fusion network structure and the optimal training of perceptual loss can significantly improve the visual reconstruction quality of video frames.
Keywords/Search Tags:Video super-resolution reconstruction, Deep learning, Multi-frame fusion, Perceptual loss
PDF Full Text Request
Related items