Data Driven Image Domain Translation Using Image Representation And Deep Learning

Posted on:2021-07-30

Degree:Doctor

Type:Dissertation

Country:China

Candidate:T Sun

Full Text:PDF

GTID:1488306050464024

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

This thesis mainly investigates data driven cross domain image translation issues.The image domains translation is not limited to visible domain and multi-spectral domain,but also extended to depth map and semantic labels.In this thesis,we are interested in depth map estimation and near infrared(NIR)to RGB image translation.For two domain translations we propose four approaches.The first one is an RGB→Depth map translation(we provide a solution to video stereolization),the second one is an NIR→RGB domain translation,and the last two approaches are low light RGB conditional NIR→RGB domain translation as follows:1)Rapid Learning-Based Video Stereolization Using GPU AccelerationVideo stereolization has received much attention in recent years due to the lack of stereoscopic 3D contents.Although video stereolization can enrich stereoscopic 3D contents,it is hard to achieve automatic 2D-to-3D conversion with less computational cost.In this chapter,we propose rapid learning-based video stereolization using graphic processing unit(GPU)acceleration.We first generate an initial depth map based on learning from examples.Then,we refine the depth map using saliency and cross-bilateral filtering to make object boundaries clear.Finally,we perform depth image based rendering(DIBR)to generate stereoscopic 3D views.Meanwhile,we provide a corresponding parallel solution for the proposed system on GPU.Experimental results demonstrate that the proposed method is nearly 180 times faster than central processing unit(CPU)based processing and generates competitive results compared with the state of the art methods.2)NIR to RGB Domain Translation Using Asymmetric Cycle Generative AdversarialNetworks Near infrared(NIR)images have clear textures but do not contain color.In this chapter,we propose NIR to RGB domain translation using asymmetric cycle generative adversarial networks(ACGAN).The RGB image(3 channels)has richer information than the NIR image(1 channel),which makes NIR-RGB domain translation asymmetric in information.We adopt asymmetric cycle GANs that have different network capacities according to the translation direction.We combine UNet and ResNet in generator and use the feature pyramid networks(FPNs)in discriminator.With the help of a 128 × 128 large receptive field,we capture rich spatial context information with a multiscale architecture.Experimental results show that the proposed method achieves natural looking NIR colorization results with high generalization ability,i.e.,feasible in category unaware cases,and outperforms state of the art methods in terms of realism and unregistration invariance.3)Low Light Conditional NIR to RGB Domain Translation by Asymmetric Cycle GANRGB cameras capture dark and noisy image with degraded color information.While NIR cameras can capture clear textures without color.We propose a conditioned RGB image generative model for fusing NIR image textures and low light RGB image colors.This chapter is an extend of Chapter 4 on NIR and RGB Fusion application.With the attendance of Low Light RGB reference image,the synthesized image is expected to be not only plausible but also approximate to the reference image in color.The main challenge here is still that low light RGB image and NIR image are not registered.We address this issue by dividedly optimize the color and texture information in two translation directions.Inspired of the Cycle GAN framework,we design a Asymmetric Cycle GAN model,and a down-sampled L1 loss for dividedly punish the color and texture loss.Experiments on open NIR-RGB dataset shows that the proposed model can effectively extract input NIR texture and low light RGB color.The generated images have high resolution and outperforms the existing methods in realism.4)Conditional Image Generation for NIR-RGB Fusion Using SPADE NetworkNear InfraRed(NIR)image is robust to ambient light and have clear textures.we propose a two sources conditioned RGB image generative model for fusing NIR image textures and low light RGB image colors.This chapter deal with the same problem with Chapter 5.For Chapter 5 use a low resolution cross domain L1 loss,which can not perfectly discriminate the color and texture features and result in a training loss contradiction.The contradiction harm the model stability during training.We address this issue by using a single directional two branches architecture which completely separate the color and texture features.The proposed image domain translation model consists of two VAEs(texture encoder and color encoder)and a Spatially Adaptive Denormalization(SPADE)generator.Experiments on open NIR-RGB dataset shows that the proposed model can effectively preserve input NIR texture and low light RGB color.The generated images have high resolution and outperforms the existing methods in realism.And thanks to the single directional architecture,the proposed model is more light weight compared with Chapter 5 method.

Keywords/Search Tags:

Multi-Spectral Fusion, Image to Image Translation, Deep Learning, Machine Learning, Depth Map Estimation

PDF Full Text Request

Related items

1	Multi-Source Image Fusion Based On Deep Learning
2	Monocular Image Depth Estimation Based On Multi-Scale Fusion And Stereo Pair Image Reconstruction
3	Research On Image Classification And Fusion Based On Machine Learning Techniques
4	Monocular Image Depth Estimation Based On Deep Learning
5	Research On Monocular Image Sequence Depth Estimation Based On Deep Learning
6	Monocular Depth Estimation From Image Sequence Based On Deep Learning
7	Fusion Method Of Panchromatic Image And Spectral Image Based On Deep Learning
8	Monocular Image Depth Estimation Based On Deep Learning
9	Research On Translation Quality Estimation Technology Based On Deep Learning
10	Research On Monocular Depth Estimation Algorithm Based On Deep Learning