Research On Image Description Generation Algorithm Based On Deep Learning

Posted on:2022-11-13

Degree:Master

Type:Thesis

Country:China

Candidate:M X Liu

Full Text:PDF

GTID:2518306746968789

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Image Captioning means that a computer automatically generates a concise and reasonable natural language description for a given input image that is in line with human reading habits,that is,obtains the target objects and their potential associations from the image,and uses human-readable descriptions.It is a common technical challenge for computer vision and natural language processing to fully express the natural language that you understand.In recent years,with the rise of deep neural networks,image captioning technology has developed rapidly,and the image captioning model has undergone many changes.From the early days,the problem of image description was transformed into the problem of segmentation into multiple domains to be studied separately to the current model structure based on end-to-end.The introduction of deep neural networks has greatly improved the performance of the model.The current work of image captioning has played an increasingly important role in many fields such as news media,medical assistance,and intelligent education.This paper focuses on the accuracy of the image captioning and the integrity of the generated description text,summarizes the current research status of the image captioning implementation,and proposes an image captioning based on the Transformer model and dual-decoder architecture.The main work done in the paper is as following:(1)Adopt the image feature extraction model based on the visual Transformer.The visual Transformer model is a model built entirely based on the self-attention mechanism.In recent years,it has emerged in the field of computer vision.Based on the visual Transformer model,it is used in target detection,image segmentation and other images Excellent performance has been achieved in basic tasks.Therefore,this paper chooses to use the visual Transformer model to extract the visual features of the image;(2)Adopt the topic-to-essay model,that is,keywords are extracted from the description text as the subject word input of the model,and the background network is built through the pre-training corpus to generate supplementary subject word information,so that the model generates the description.When the text is used,there can be more dependency information,so a description text that expresses the feature information in the image more comprehensively can be generated;(3)Based on(1)and(2),proposed a dual-decoder-based image captioning model;a decoder is added to the classical encoder-decoder architecture,and the decoder structure is embedded in the topic-to-essay model,from above keywords are extracted in a decoder,and the attention weights directly from the image feature information in the encoder are added,and the outputs of the two decoders are fused as the final description text for the input image.In this paper,the performance of the model is tested on the MSCOCO Caption2014 test.Compared with the baseline model and other image captioning algorithms,the automatic evaluation indicators of the image captioning model based on the visual Transformer model and the dual decoder architecture(including BLEU1,BLEU4,METEOR,CIDEr,ROUGE?L,SPICE)can maintain similar performance to the excellent models.And in the manual scoring test,the difference from the manually labeled description sentence is only 0.01,which proves that the description text generated by the model proposed in this paper remains relatively high.At the same time,for the comprehensiveness of image description,a topic-to-essay text model is embedded in the decoder structure of the model in this paper,which can generate longer description text and describe the image more comprehensively.In conclusion,the experimental results meet the expectations of the paper.

Keywords/Search Tags:

Image captioning, Transformer, Dual-Decoder, Topic-to-Essay Model

PDF Full Text Request

Related items

1	Research On Image Captioning Algorithm Based On Attention Mechanism
2	Research On Video Captioning Algorithm Based On Attention Mechanism
3	Video Captioning Research Method Based On Transformer Network And Bidirectional Decoding
4	Research On Image Captioning Algorithm Based On Encoding And Decoding
5	Research On Image Captioning Models Based On High-Level Semantics
6	Image Chinese Captioning Model Based On Deep Learning
7	Deep Learning For Image Captioning
8	Research On Semantic-Attentive Deep Image Captioning Method
9	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion
10	Image Captioning Based On Adaptive Visual Attention Mechanism