Font Size: a A A

High Definition Image Generation Guided By Semantic Labels

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:M X YuFull Text:PDF
GTID:2518306050967359Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence represented by deep learning has developed rapidly,and has made huge breakthroughs in many fields.Deep learning depends on big data and the improvement of computing power.It uses a large number of parameters in the network model to fit the problem function to be solved.In the field of images,as the complexity of the problem increases,the requirements for data and computing power of deep learning become increasing.Under the premise that the processor's computing power is hard to make a huge breakthrough due to materials,the quality and quantity of data sets have become problems that need to be solved.Image generation is an effective way to solve this problem.Image generation is generally divided into street scene generation and face generation.This paper analyzes the difficulties of image generation tasks in detail.For street scene generation,the existing generation model is difficult to obtain long-range dependencies due to the receptive field of the convolution kernel.On the other hand,generative models are poorly interpretable.Traditional models are fitted with problem functions on the total parameter space,and it is difficult to analyze the role of each layer of the network in the generative model.That leads the generative model to deeper.For face generation,traditional work focuses on unconditional face generation and conditional face generation under the guidance of facial semantic labels.Unconditional face generation is difficult to control the shape of the generated face.Conditional face generation under the guidance of semantic labels is impossible to generate faces on a large scale because the face semantic labels are difficult to mark.In order to solve the above three problems,this paper focuses on the generation of high definition image guided by semantic labels and the main work of this paper including as follows:(1)Designing a street scene generation model with interregional attention mechanism.Aiming at the problem that it is difficult to obtain long-distance dependencies,inspired by natural language processing,this paper generates images in parts and adds a self-attention mechanism before and after each part.This can obtain spatial dependencies and increase the response and consistency between categories in different regions.While obtaining longdistance dependence,part of the generation also preserves the independence between regions,preventing image blurring due to over-coupling in the image generation process.(2)Designing a street scene generation model based on conditionally predictable parameters.Aiming at the problem that the generation model is difficult to analyze the role of each layer in the model,this paper explores the functioning generation for street scene.This paper considers different convolution kernels as different "brushes".In order to predict the brushes,this paper designs different prediction networks to predict the corresponding convolution kernel parameters in different small parameter spaces.Besides that,this paper regards the down-sampling process in the discriminator is as the inverse process in the decoding stage to design corresponding loss module.That makes the convolution kernel in the decoding stage have a specific function.(3)Designing a high-resolution face generation model based on mask guidance.In addition to unconditional face generation and conditional face generation guided by facial semantic label,this paper designs and implements conditional face generation guided by masks.This model can take a mask segmentation map as input to generate high-definition human faces with corresponding shape.In this paper,the nose,eyes,and hair in the human face are considered as detailed styles in style generation.The input style noise is decoupled by the mapping network,and is performed on the model 64×64 scale to injecting style.In addition,this paper found the defect of unclear edge segmentation inherent in mask maps.To solve this problem,inspired by the field of image matting,this paper designs a high-definition face generation models guided by matting.The generated images are sharper on the edges.Compared to the traditional generation model,the street scene generation model with interregional attention mechanism in this paper is better than the best existing street scene generation model at 1024×512.The street scene generation model based on conditionally predictable parameters can achieve high resolution street scene and is better than existing models at 512×256.And the face generation model under the guidance of the designed mask can accurately and clearly generate human faces.
Keywords/Search Tags:deep learning, image generation, face generation, generative adversarial nets, attention mechanism, parameter prediction
PDF Full Text Request
Related items