Font Size: a A A

Study On Lightweight Generative Network Architectures For Image Enhancement And Inpainting Based On Attention

Posted on:2021-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2518306050470484Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of smartphones,more and more people prefer to take photos using smartphones cameras rather than digital still cameras due to mobility and easy function.However,the photos captured by mobile phones usually contain noise,low contrast and weak color.Image enhancement on smartphones is required.Moreover,some photos contain incomplete area or occlusion that needs image inpainting.Therefore,image enhancement and inpainting for digital photos are of practical importance and have received much attention by researchers.For instance,partly or fully occluded faces exist,and thus face completion plays an important role in computational photography.Since both foreground and background are moving in video clips,object removal and completion are a challenging task.In this thesis,we investigate solving the above-mentioned issues in computational photography: image enhancement on smartphones,face completion,image inpainting,video target removal and completion.To be specific,we aim to propose lightweight network architectures that reduces the runtime while maintaining performance.The research scope and main contents of this thesis are as follows:1.For image enhancement on smartphones,a light-weight generative model based on GAN is proposed to keep a balance between quality and speed,named multi-connected residual network(MCRN).The proposed network consists of one discriminator and one generator.The generator has a two-stage architecture: 1)the first stage extracts structural features;2)the second stage focuses on improving perceptual visual quality.By utilizing the structure of multiple connections,we achieve good performance in image enhancement on smartphones while ensuring fast network convergence and less time in the test phase.Experiments and ablation study indicate that the proposed method outperforms the state-ofthe-art approaches in terms of the perceptual visual quality and the speed in both training and test phases.2.Facial image inpainting is a challenging task due to the lack of crucial parts in the face,such as eyes and nose.In this thesis,a simple and effective method is proposed for inpainting the missing content in face images.We build an end-to-end multi-level generative network that captures features at different levels while reducing training time.Multi-scale feature maps are used to generate natural human faces with realistic textures.To optimize the parameters of the network,we use two losses: content and texture.The former includes average absolute error(MAE)and multi-scale structural similarity(MS-SSIM)losses to minimize content distortion;the latter includes style and adversarial losses to fine-tune texture synthesis.To optimize the proposed network for facial image inpainting,we adopt multi-level attention based generative architecture.Multi-level feature processing not only reduces the training/testing time but also keeps the performance by reducing the number of channels per convolutional layer in the generative architecture.We combine attention and multi-level feature processing to maintain a soft relationship with the surrounding content.For network optimization,we also use two loss functions: content and texture.Different from the loss of the previous method,content loss includes average absolute error(MAE)and edge-preserving losses to produce realistic results,while texture loss includes adversarial loss and perceptual loss to fine-tune texture synthesis.Moreover,we use edge-preserving loss to keep the edge and patch similarity.Various experiments show that the proposed method not only can generate realistic results on random masks but also outperforms to the state-of-the-art approaches in quantitative measurements and subjective evaluation.3.For image completion,a light-weight generation network with feature contrast enhancement is proposed,which is based on dilated convolution and channel attention.We adopt dilated convolution to expands the receptive of the convolution kernel with the same number of parameters.Also,we use channel attention to enhance the feature contrast using adaptive weights.For the loss function,symmetric mean absolute percentage error(SMAPE)and color enhancement losses achieve image completion with high quality and natural color,respectively,in which SMAPE is used instead of reconstruction loss.The number of parameters in the proposed generative network is only 4.5M parameters.Experimental results show that the proposed method is superior to the state-of-the-art approaches in terms of visual quality,quantitative measurement,and runtime.4.Video inpainting is a challenging task in computer vision to keep the coherence of the temporal frames on the information and content,whose goal is synthesizing photo-realistic contents and keeping natural information between frames in the masked region in a video.In this thesis,we utilize instance segmentation as the object mask to remove and complete the missing regions.To make full use of spatial and temporal information,we build a simple yet effective generative architecture for video inpainting.We first present a temporal patch non-local module to make the best use of temporal frames.Then,we propose dilated multi scale module(Dli MC)module to reuse feature maps and make full use of spatial information.Dilated multi-scale module(Dli MC)consists of dilated convolution,two scale as multi scales to collect different information of features with multi resolutions,and residual connection.The two modules reduce the runtime and the parameter size of the generator.To optimize the proposed network,mask-adversarial loss is not only used to fine-tune the texture but also improve the details.Compared with other approaches,the experimental results demonstrate the effectiveness of the proposed method in terms of visual quality and subjective evaluations.
Keywords/Search Tags:Image Enhancement on Smartphone, Face Completion, Image Inpainting, Video Inpainting, Computational Photography, Light-weight Model, Generative Adversarial Network
PDF Full Text Request
Related items