Remote Sensing Image Captioning Based On Deep Neural Network

Posted on:2021-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Shen

Full Text:PDF

GTID:2392330626458730

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Image captioning,i.e.,generating the natural semantic descriptions of given images,is an essential task for machines to understand the content of the image.Remote sensing image captioning is a part of the field.Most of the current image captioning models usually consist of an encoder and a decoder.In the encoder part,convolutional neural networks are used for extracting image features.In the decoder part,recurrent neural network and its variants are used for generating descriptive target sequences.Most of the current remote sensing image captioning models suffered the overfitting problem and failed to fully utilize the semantic information in images.Based on the encoder-decoder architecture,this thesis employs Transformer as the new decoder and design two-step optimization algorithm based on variational autoencoder for remote sensing image captioning models,our contributions are listed as follows:To resolve the overfitting problem caused by the small size of remote sensing image dataset,this thesis makes several modifications to the Transformer,including additional dropout layer,residual connections and adaptive feature fusion.Both lowlevel spatial features and high-level semantic features are use passed to the decoder.Reinforcement Learning is then applied for further improving the performance.In addition,due to the gap between the natural image and remote sensing image,it may not perform well to directly use CNN pre-trained on ImageNet dataset to extract remote sensing image features.This thesis proposed a two-step optimization algorithm based on variational autoencoder,including encoder finetuning stage and decoder optimization stage.During the encoder finetuning period,we finetune the convolutional neural network on remote sensing image classification dataset jointly with the variational autoencoder branch.During the decoder optimization period,we apply self-attention mechanism on the spatial features for better representation.The experimental results show that methods in the thesis can relieve the overfitting problems on remote sensing image captioning,enhance the remote sensing image feature extraction ability by fully utilizing the semantic information.Our model greatly surpasses the previous state-of-the-art records on seven metrics,including BLEU1-4,METEOR,ROUGE-N and CIDEr.

Keywords/Search Tags:

transformer, variational autoencoder, remote sensing image captioning, convolutional neural network, reinforcement learning

PDF Full Text Request

Related items

1	Remote Sensing Image Captioning Based On Deep Learning
2	Research On Convolutional Neural Network Image Compression Algorithm For UAV Remote Sensing Images
3	Extraction Method Of GF-2 Remote Sensing Image Based On Convolutional Neural Network
4	The Application Of Compression And Acceleration Of Convolutional Neural Networks Methods In Remote Sensing Image Classification
5	Research On Scene Classification Of Remote Sensing Images Based On Convolutional Neural Network
6	Research On Remote Sensing Image Classification Based On Deep Convolutional Neural Network
7	Research On High Resolution Remote Sensing Image Classification Method Based On Convolutional Neural Network
8	Remote Sensing Image Classification Algorithm Based On Convolutional Neural Network
9	Remote Sensing Image Fusion Based On Multi-morphological Convolutional Neural Network
10	Pan-sharpening Methods For Remote Sensing Image Based On Convolutional Neural Network