Font Size: a A A

Research On Image Captioning Algorithm Based On Deep Learning

Posted on:2022-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:S Z LiFull Text:PDF
GTID:2518306563476084Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Image captioning refers to the automatic generation of concise natural language descriptions of a given image by a computer.Computer image captioning can convert image information into textual information and thus realize the conversion between different modalities of information,which has a broad application prospect in various aspects such as image retrieval,intelligent education,and(blind)visual aids.Image captioning is a multimodal learning problem,which not only needs to accurately identify objects and attributes and capture their relationships,but also needs to consider the accuracy of syntax and semantic diversity.Therefore,the realization of image captioning requires the combination of computer vision,natural language processing,machine learning and other domain knowledge,which is a very challenging task.In this thesis,the deep learning-based image captioning algorithm is studied,and a context-based image captioning algorithm is proposed.The main work of the thesis is as follows:(1)In order to improve the network feature extraction capability,this thesis proposes SENET101 network model to image captioning.The model includes an image encoder for extracting visual features and an image text generation model for decoding visual features into sentences.The image encoder is based on the Res Net101 network and combined with the SE module,which can extract deeper image features through the channel attention mechanism and lays the foundation for image captioning.(2)In order to highlight the features of object categories in images,this thesis proposes a context-based image captioning algorithm.Based on the deep-level image features extracted by SENET101 network,an ENCNET model embedded in a contextual encoding network is designed.This model inputs the image features extracted by the Res Net101 network into the Encode encoding layer.Finally,the output image features are fused with the image features extracted by SENET101 network and input to the decoder to generate the image captioning.The performance of image captioning algorithm is further improved.(3)The reinforcement learning algorithm is introduced.Since the words in the previous moment of the model training are input to the decoder from the real words in the training set,and the words in the test rely on the words generated by themselves,for this reason,SCST is introduced in this thesis to solve the problem of inconsistency between model training and testing.In order to make the similarity between image description statements,twin networks are introduced in this thesis,and good results are achieved.The performance of the algorithm is tested on the MSCOCO dataset.Through comparative experiments,it is proved that the context-based image captioning algorithm achieves the best performance compared with the baseline model and other image captioning algorithms.The algorithm can reach 0.783,0.571 and 1.176 in BLEU-1,ROUGE?L and CIDEr scores respectively,and the experimental results show the effectiveness of the algorithm in this thesis.
Keywords/Search Tags:Image Captioning, Convolutional Neural Network, Long Short Term Memory Network, Context Description, Attention Mechanism
PDF Full Text Request
Related items