With the development of deep neural networks and image processing technology,it is possible to convert real scene images into animation.Animation technology can provide AI-assisted image creation with new methods and content,and has a wide range of application scenarios and great values in multimedia services such as Internet social networking,short video creation,and video ring back tones.However,the existing animation algorithms still have some problems:(1)When the background is animated,the deviation of color of the images present a heavier filter sense,and the image details are excessively weakened,resulting in poorer image quality.(2)When the portrait is animated,the generation of facial features is not stable enough,and the proportion of face and image size have a greater impact on the conversion result.Meanwhile,it is unable to retain sufficient background information of image and difficult to convert both the face and the background.In order to address these issues,combined with the application requirements of animation in the multimedia business,this thesis researches the animation of background and portraits based on generative adversarial networks(GAN).The main contents are as follows:Firstly,a scene animation network based on perceiving style through the difference in frequency spectrum is proposed.By analyzing the characteristics and differences in the frequency spectrum of real-scene images and animation-domain images,the generator learns the mapping relationship better to fit the style distribution of animation-domain images.On this basis,combined with hand-extracted animation features,the image conversion can be adjusted more finely.Compared with CartoonGAN and AnimeGAN under the human visual effect and GAN evaluation index,the images generated by the network model proposed in this thesis are more superior.It performs better in image details,color reproduction,and realism,and can present an obvious animation style.Secondly,a portrait animation network based on local enhanced perception is proposed.According to the animation characteristics and requirements of portrait scenes,the network adds a local reinforcement module based on the self-attention mechanism to the cycle-consistent adversarial network.This module guides the network to pay attention to the significant areas in the image,that is,the facial features,and combines the layer normalization method and instance normalization method through the weighting of the attention weight coefficient,so that the model can process different areas of the image differently.As a result,it can flexibly control the amount of change in style,texture,and contour.In addition,the network uses a fully convolutional structure,so that the model is lighter and supports any size inputs of image.It can highly deform the facial features while keeping the semantic structure of the background,towards the ultimate goal of the animation task which is the integration of people and landscapes.Thirdly,an animation platform system of image and video is designed and implemented.The platform uses the scene animation network and portrait animation network proposed in this article to provide users with image and video animation conversion services. |