Font Size: a A A

Video-style Transfer Based On Auto-encoder Structure And Gradient-preserving Order

Posted on:2020-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2428330602452037Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has achieved great success.According to different types of learning methods,different learning methods are proposed,including supervised learning,semi-supervised learning and unsupervised learning.The field of computer vision has also developed by leaps and bounds.Video style transfer,as a branch of computer vision,has also made rapid progress.The video style transfer originates from the image style transfer.The image style transfer extracts the content information of the natural image through the pretrained convolution neural network,and extracts the texture features of the image through the gram matrix.The purpose of video style transfer is to make stylized video similar to original video in spatial structure and similar to style map in texture information.In the field of video style transfer,the most concerned research direction is the de-flicker algorithm of stylized video.By adding optical flow estimation and introducing temporal consistency loss,Ruder et al constrained the temporal consistency of stylized video and achieved the goal of de-flicker.Huang et al accelerated video style transfer by feedforward neural network.It is proved that feedforward neural network can learn the temporal consistency of video.However,the temporal consistency between adjacent stylized video frames is constrained by the optical flow of the original video,which leads to errors in the training process.Because the stylized video frames are distorted in spatial structure,they do not fit the optical flow evaluated by the original video.In view of this disadvantage,this paper adopts the idea of autoencoder and divides the network structure into two parts: Encoder network and Decoder network.The Encoder network is responsible for transfer the original video frame to the stylized video frame,and the Decoder network is responsible for reconstructing the generated stylized video frame into the original video frame.This method chooses to constrain the temporal consistency on the reconstructed video frame,because the reconstructed video frame is basically similar to the original video frame in spatial structure.Therefore,this makes up for the error effectively,and proves that the constraint temporal consistency on the reconstructed video frame can also achieve the goal of de-flicker,which makes the visual effect of stylized video better.The reconstruction loss is also introduced in this paper,in order to have the reconstruction performance of the decoding network.In addition,in the aspect of constrain temporal consistency of stylized video frames.In the past,only optical flow was used to restrict the temporal consistency of video background by adding mask,because the foreground object was moving all the time,and the flickering phenomenon of moving object was not obvious.But the constraints of foreground target and background in training process are different due to the addition of mask.Therefore,during the training process,there will be a halo near the edge of the foreground target.In order to solve this problem,the gradient order preserving loss function is added to the loss function.By extracting the order of the gradient in different directions of pixels near the edge,this constraint effectively ensures that the gradient direction of pixels near the edge is consistent with that of the original video.Because there is no halo at the boundary of the original video,the constrained stylized video also handles more naturally at the boundary,and suppresses the generation of the halo.This constraint only constrains gradient order but not pixel gradient value,which is because this paper not only suppresses halo,but also ensures that the texture information of style is not destroyed.
Keywords/Search Tags:Video style transfer, temporal consistency, autoencoder, the gradient order preserving
PDF Full Text Request
Related items