Font Size: a A A

Research On Visual Media Style Transfer With Deep Neural Networks

Posted on:2020-12-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:D D ChenFull Text:PDF
GTID:1368330572478915Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the richness of material life,people's demand for spiritual and cultural life is becoming stronger and stronger.As important parts of spiritual and cultural life,art cre-ation and entertainment consumption have received more and more attention.However,traditional art and entertainment content creation can only be completed by profession-als,which is very labor-intensive and time-consuming.In recent years,artificial intel-ligence technology has been greatly developed,and how to use artificial intelligence technology to create art and entertainment content has become an important research problem.In the field of traditional computer vision and graphics,this problem has also attracted the attention of many researchers in the last decades.This is not only because of its profound theoretical value,but also because of its broad application prospects and huge potential commercial value.Recently,Gates and Johnson et al.began to utilize deep neural network for im-age artistic style transfer and achieved great success.These deep neural network-based style transfer algorithms can not only overcome the limitation of traditional methods that only some specific styles can be mimicked but also perform semantic-level styliza-tion.However,for different types of visual media,existing style transfer methods still face some big limitations that need to be addressed:1)For image,current feed-forward network based methods can only handle one style with one model.2)For video,when directly applying the image based style transfer algorithm to video frames,the origi-nal temporal coherency will be damaged and causes severe flickering artifacts in the video stylization results.3)For stereoscopic contents,there is no algorithm available for stereoscopic image and video style transfer,.4)For traditional gray visual content,existing generalized style transfer methods(i.e.colorization algorithms)can not guar-antee the robustness of learning based methods and the controllability of example based methods at the same time.This dissertation focuses on developing effective algorithms to solve the above problems.The main contributions and innovations are summarized as follow:1.Proposed a novel Stylebank based multi-style transfer algorithm.Inspired by the"texton" idea in traditional texture synthesis methods,it is composed of multi-ple convolution filter banks and each filter bank explicitly represents one style.The whole network structure consists of three modules:an encoder,the proposed StyleBank layer,and a decoder.Different styles share the same encoder and de-coder,which transform them into intermediate features and back to the original image space respectively.To transfer an image to a specific style,the corre-sponding filter bank is operated on top of the intermediate features produced by the encoder.Experiments demonstrate that,with a new two-branch training strat-egy,this algorithm can successfully decouple the style from the content.It not only enables multi-style training within one single model,but also have several advantages,such as fast incremental training for new styles,multi-style fusion and fast style switching.2.Proposed a new video style transfer algorithm based on feature flow and gated aggregation mechanism.If directly applying the image based style transfer al-gorithm to the video frame by frame,the video stylization results would have severe flickering artifacts.To address this problem,this algorithm incorporates two sub-modules into the pre-trained image style transfer network:a feature-level flow sub-network and a mask sub-network.By adding an extra temporal coherence loss term into the original image style transfer objective function,the two sub-modules and the style network are jointly trained on a large-scale video dataset.Experiments demonstrate that it can achieve temporal coherent styliza-tion results while maintaining the original style fidelity,which is very challenging for existing methods.Moreover,the feature-level propagation and composition idea proposed in this algorithm may be potentially applied to other video editing algorithms,which also demonstrates its strong generalization ability.3.Proposed the first stereoscopic image and video style transfer algorithm based on a feature-level middle domain integration mechanism.To solve the disparity in-consistency problem in stereoscopic image style transfer,the proposed algorithm introduces the first end-to-end network which simultaneously estimates bidirec-tional disparity maps and occlusion masks and a novel middle domain integra-tion mechanism.By incorporating the aforementioned temporal coherence con-straint,this algorithm can be further extended to stereoscopic video style transfer.Experiments demonstrate that this algorithm can not only guarantee the dispar-ity consistency of the stylization results of each stereoscopic image pair but also the temporal coherency of different frames,which outperform existing baseline methods by a large margin.4.Proposed the first deep examplar-based colorization algorithm.Given a refer-ence color image,this algorithm directly maps a grayscale image to an output colorized image.Rather than using hand-crafted rules as in traditional exemplar-based methods,it learns how to select,propagate,and predict colors from a large-scale dataset.In order to further reduce manual effort in selecting the references,an automatic recommend system is proposed by considering both the global and local semantic information.Experiments demonstrate that it is the first that com-bines the robustness of learning based methods and the controllability of example based methods.For the regions that can find semantic matches in the reference image,it adopts the reference color and degenerates to the color learned from the large-scale dataset conversely.
Keywords/Search Tags:Deep Neural Networks, Visual Media Style Transfer, Image Colorization, Temporal Coherency, Disparity Consistency
PDF Full Text Request
Related items