Font Size: a A A

Remote Sensing Image Captioning Based On Deep Neural Network

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ShenFull Text:PDF
GTID:2392330626458730Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image captioning,i.e.,generating the natural semantic descriptions of given images,is an essential task for machines to understand the content of the image.Remote sensing image captioning is a part of the field.Most of the current image captioning models usually consist of an encoder and a decoder.In the encoder part,convolutional neural networks are used for extracting image features.In the decoder part,recurrent neural network and its variants are used for generating descriptive target sequences.Most of the current remote sensing image captioning models suffered the overfitting problem and failed to fully utilize the semantic information in images.Based on the encoder-decoder architecture,this thesis employs Transformer as the new decoder and design two-step optimization algorithm based on variational autoencoder for remote sensing image captioning models,our contributions are listed as follows:To resolve the overfitting problem caused by the small size of remote sensing image dataset,this thesis makes several modifications to the Transformer,including additional dropout layer,residual connections and adaptive feature fusion.Both lowlevel spatial features and high-level semantic features are use passed to the decoder.Reinforcement Learning is then applied for further improving the performance.In addition,due to the gap between the natural image and remote sensing image,it may not perform well to directly use CNN pre-trained on ImageNet dataset to extract remote sensing image features.This thesis proposed a two-step optimization algorithm based on variational autoencoder,including encoder finetuning stage and decoder optimization stage.During the encoder finetuning period,we finetune the convolutional neural network on remote sensing image classification dataset jointly with the variational autoencoder branch.During the decoder optimization period,we apply self-attention mechanism on the spatial features for better representation.The experimental results show that methods in the thesis can relieve the overfitting problems on remote sensing image captioning,enhance the remote sensing image feature extraction ability by fully utilizing the semantic information.Our model greatly surpasses the previous state-of-the-art records on seven metrics,including BLEU1-4,METEOR,ROUGE-N and CIDEr.
Keywords/Search Tags:transformer, variational autoencoder, remote sensing image captioning, convolutional neural network, reinforcement learning
PDF Full Text Request
Related items