| At present,the research of remote sensing image focuses on classification,detection,segmentation and other tasks,and has made good progress.In fact,the purpose of the above studies is to better automatically mine and understand the information of remote sensing image.Language is the most common way of communication for human beings,and it can express rich information in concise words.Therefore,how to transform remote sensing image into language is worth exploring.The task of image caption involves the generation of a sentence that can describe an image appropriately.This task is not a single classification or detection problem.It needs to be more complex,which needs to know multiple objects in the image,and also needs to know the high-level relationship between them.The recognition of this high-level relationship can deepen the computer’s understanding of remote sensing image,which has important research significance.However,due to the wide range of scenes and numerous targets in remote sensing images,it is difficult to generate a suitable description.How to build a fast and stable remote sensing image description model with good performance is the focus of this research.The specific research work of this paper is as follows:(1)In order to further promote the practical value of remote sensing image description,we have done a lot of repair work on existing remote sensing image description data sets.We have modified a series of issues including word errors,grammatical errors,and inappropriate descriptions.Over 20% of the data sets and nearly 3000 pictures have been corrected.(2)Considering the performance.In order to improve the accuracy of remote sensing image description,we propose a multi-layer attention structure.This structure is more in line with the human attention process when describing an image.It mainly includes three attention structures: attention to different regions of the image,attention to the generated words,and attention to vision and semantic information.This model can notice useful information more intelligently,and can make greater use of image and semantic information.Experiments show that this model has achieved a better performance in remote sensing image description.(3)Considering the calculation speed,we propose a faster remote sensing image description model to solve the problem of too many parameters and too high time complexity in existing model.This fast model uses good and faster convolutional networks and recurrent networks as encoder and decoder,and also uses a more streamlined attention structure.At the same time,in order to ensure the model effect and accelerate the training,skip-gram and negative sampling methods are used in advance to learn word vectors.Experiments show that the model can greatly reduce the model size and improve the operation speed of the model with a small loss of accuracy.(4)Considering the security and stability.Because the application scenarios of remote sensing images have absolute requirements for security and stability,this paper first studies the method of generating adversarial samples on the remote sensing image description task.We have proposed a black box attack method for the encoder-decoder structure without the need to know the specific encoding and decoding network,which can be used to judge the robustness and security of existing models.In addition,this paper has also carried out some defensive work on the adversarial samples,providing a basis for further improving the stability and robustness of the remote sensing image description. |