Font Size: a A A

Research On Cross-modal Image Generation Based On Generative Adversarial Network

Posted on:2020-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:L J ZhangFull Text:PDF
GTID:2428330575994686Subject:System theory
Abstract/Summary:PDF Full Text Request
Multi-modal images refer to images that acquired using different techniques or methods,which describe different attributes of the same object.For example,for an object,the color image taken with a camera and the depth image scanned with a lidar are two of the modal images of the object.In the image processing based on deep learning,the use of multi-modal images is helpful to improve the expression ability of object properties,to obtain more comprehensive and more accurate information of the objects,so as to expand and improve the processing ability of related application.However,due to equipment,cost and other reasons,it is often very difficult to directly obtain multi-modal images of objects.In order to solve the problem of difficulty in acquiring a certain modal image of an object,inspired by the idea of image-to-image translation,this paper proposes a method of cross-modal image generation:using the readily available modal image to generate the desired target modal image.In view of the shortcomings of existing image-to-image translation methods in processing cross-modal image generation,this paper proposes to add a small number of real target observations to generate target modal images that meet realistic requirements.Furthermore,this paper proposes two different cross-modal image generation models based on two different processing methods for target observations.Experiments show that the proposed method can effectively improve the accuracy and stability of the application system in practical applications,and has strong transfer learning ability and generalization ability.The main work of this paper includes the following three aspects:(1)Proposes a sparse target observation-assisted cross-modal image generation method,and,based on the generative adversarial network,constructs a cross-modal image generation model--GAN2 C.By using GAN's adversarial learning and supervised learning of the sparse target observations,this model improves the effect of cross-modal image generation.The experiments results of image color recovery show that,compared with the classical image-to-image translation methods,the proposed method can generate near-real color images.(2)Aiming at the sparsity of effective observations in sparse images and the limitations of convolutional neural networks in dealing with sparse images,this paper proposes a sparse convolution fusion operation,and,combining with the cross-modal image generation,proposes a Sparse Convolutional Fusion Network(SCFN).The sparse convolution fusionnetwork can extract effective modal information from sparse target observation images and fuse it with source modal images.(3)Applies the proposed cross-modal image generation architecture to image depth prediction tasks,including indoor scenes images and outdoor traffic scenes images.The experimental results show that the proposed methods can obtain a depth prediction accuracy compared with the state of art.The RMSE on the NYU-Depth-v2 dataset is 0.261 m;the RMSE on the KITTI dataset is 962.30 mm.
Keywords/Search Tags:cross-modal image generation, generative adversarial network, image to image translation, convolutional neural network, image depth prediction
PDF Full Text Request
Related items