Font Size: a A A

Research On Image Description Generation Method Based On Deep Learning

Posted on:2020-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2428330596474945Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of digital devices such as smartphones and tablets,and the development of storage technologies,many images are produced in daily life.It is a simple matter for people to understand these images,but for machines,these images are only Some pixel values don't make any sense,so how to make the machine understand the image becomes more valuable.In recent years,with the development of target detection and machine translation,the image description generation task has made great progress.In general,image description generation can be regarded as “viewing the picture”.The model input of the task is a picture,and the output is a natural language that humans can understand.The main contents of this paper are as follows:First,this article applies the attentional mechanism of great success in recent years.The core idea of the attention mechanism is to imitate the visual attention of human beings.Applying this mechanism to the image description can make the model automatically filter out the more critical image regions when generating words,and improve the accuracy of generating sentences.Then,the image description model based on local features only uses image local features,and there may be information loss.For this problem,this paper proposes an image description model combining global features and local feature attention mechanism.The model extracts the global features and local features of the image using the pre-trained convolutional neural network model at the encoder end,and fuses the two different scale models together to form the result of encoding the image.The natural language model consisting of LSTM is then decoded to translate the extracted image features into natural language.The model can make full use of the image features of the image at two different scales,and the generated natural language is more accurate.Finally,the image description model combining global features and local features forces the global features to be valid for each word generation,and there is some irrationality.To solve this problem,this paper proposes an adaptive attention mechanism.The mechanism is applied to the image description model,which also employs an encoder-decoder architecture.The adaptive attention mechanism allows the model to automatically select the local features of the image of interest or the global features of the image when generating the image description.This model is trained and evaluated on the Microsoft COCO dataset.On the evaluation indexes of BLEU,ROUGE-l and CIDEr,the performance of local model,the model combining global feature and local feature and the model of adaptive attention mechanism are compared.The result is proved and the adaptive attention mechanism model is used.The highest scores were obtained on the above evaluation indicators,and the model scores with global features and local features were second,and the models based on local features had the lowest scores.
Keywords/Search Tags:deep learning, image description generation, attention mechanism, image feature
PDF Full Text Request
Related items