Font Size: a A A

Research On Video Style Transformation System Based On AutoML

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:R JiaFull Text:PDF
GTID:2558306917982719Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Style transformation is a technology that converts content images into specific style images through computers.It has a wide range of applications in special effects production,scene simulation,and virtual reality.At present,video style transformation still has problems such as flicker,jitter,large amount of model parameters,long training time,and more information loss during feature extraction.This paper studies the video style transformation through the method of machine learning.The work done is as follows:Aiming at the problem that the previous video style transformation network can only perform a single style transformation,this paper proposes a method of adding Adain module to the video style transformation model,which can perform multiple style transformations through a single network.Compared with previous methods,the training and testing efficiency is improved by 32%.Aiming at the flicker jitter in video style transformation,this paper proposes a new interframe loss function.It consists of two parts:calculating the loss on the original image scale of the converted image before and after the frame by the optical flow method,and the mean square error loss on the feature map scale of the converted image before and after the two frames through the loss network.At the same time,statistics on the distribution of the loss between frames are used to filter the abnormal values of the loss between frames,which eliminates the impact of loss surges in the frames before and after the scene transition,and further suppresses the problem of flicker jitter.Aiming at the problems of the current video style transformation network with many model parameters,large calculation amount,and long transformation time,this paper uses the latest lightweight network EfficientNet as the basic network for video style transformation.The parameter amount of Efficient-B7 is reduced by 8.4 times compared to GPipe and the inference speed is 6.1 times that of GPipe.In order to reduce the information loss in the feature extraction process,this paper changes the activation function in the network to the Swish function and replaces the pooling layer with a convolution layer with a step size of 2.At the same time,the hyperparameter optimization method BOHB combining Bayesian optimization and Hyperband was used to optimize the hyperparameters to find the best hyperparameter configuration.In the end,the method proposed in this paper reduced the inter-frame loss on the test video by 9.8%compared with the SOTA method,but the transformation time was only 1/6 of the latter.The visual effect eliminates the flickering and jittering phenomenon that occurs during video transformation.Multiple style transformations can be performed by one model,which meets the needs of the model deployment on the mobile end.
Keywords/Search Tags:AutoML, Video style transformation, Machine learning, Lightweight model, BOHB optimization algorithm
PDF Full Text Request
Related items