Font Size: a A A

Research And Implementation Of Vision-touch Cross-modal Algorithm Based On GAN

Posted on:2022-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2518306572960069Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Humans perceive the world using multi-modal sensory inputs such as vision,audition,and touch.In this work,this paper investigate the cross-modal connection between vision and touch.The main challenge in this cross-domain modeling task lies in the signifificant scale discrepancy between the two: while our eyes perceive an entire visual scene at once,humans can only feel a small region of an object at any given moment.To connect vision and touch,this paper introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how this paper interact with objects given tactile data as input.To accomplish our goals,this paper uses a standard data set,which collects large-scale data of the corresponding visual and tactile image sequences.To close the scale gap,this paper proposes a new conditional confrontational model that contains the proportion of the touch and location information.In response to the neural network,the object boundary is uncertain when the image is transformed,and the image coloring quality is not high,and this paper proposes a gradation image coloring method that combines Pix2 Pix to generate a counter network.First improve the U-Net structure,and then optimize the loss function.The problem of the output sequence of the tactile image and the input sequence of the visual image is not synchronized in time,and a plurality of images are added as a continuous frame input.Finally,this model has also encountered a serious pattern crash,and this paper uses a data re-balance mechanism to solve this problem.The final experiment showed that in the same experimental environment,the models of this paper can produce a better visual image from tactile data.The conventional Pix2 Pix network structure is very dependent on the data set.It means that this article must use a data set that contains a visual and tactile image,but the research background of this article is to help the robot walking naturally on the surface of the moon,When the robot sees the visual signal,it can predict a tactile signal.In this experimental scene,this paper is difficult to get one-oriented perfect data set,so this article uses Cycle GAN to optimize the network structure,making the experiment do not depend A corresponding data set.The essence of Cycle GAN is to enable the source domain and the target domain to map each other,thereby avoiding mapping issues from the source domain to the target domain.The experimental results also indicate that this network structure can solve the problem of nothing of the data set.
Keywords/Search Tags:Generative Adversarial Network, deep learning, Image Generation, Pix2Pix, Cross Modal
PDF Full Text Request
Related items