Font Size: a A A

Research On Image Description Method Based On Multimodal Recurrent Neural Networks

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:K Z LiFull Text:PDF
GTID:2428330605460925Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of artificial intelligence technology and its applications,neural network algorithms only need a mass of data and high-performance hardware equipment to enable the computer to simulate human behavior and apply it to all aspects of life,so that humans can complete work more efficiently,obtain considerable economic benefits and promote social progress.Image captioning combines the two most popular research fields of natural language processing and computer vision.It aims to enable the computer to output coherent and smooth natural language description sentences about image content after analyzing the visual information of the input image.At present,the research on image description generation has made rapid development,many different methods have been derived.However,the existing image captioning methods generally have some problems,such as the generated description sentences lack of long-term memories and the poor correlation between the generated description sentences and the image.Based on multi-modal neural network(m-RNN),this dissertation analyzes the structure of m-RNN,combines the current research fronts of image processing and natural language processing,from the two aspects of image feature extraction and sequence data processing,finds out the problem of m-RNN's poor descriptions on some images,and do the following works.(1)Exploring use Convolutional Neural Network(CNN)to extract image features,deeply understanding the logical connotation of Vgg-16 network,and using the Convolutional Block Attention Module(CBAM)to optimize the Vgg-16 network when constructing the image feature extraction part of the image captioning model.The module can optimize the image feature extraction of the Vgg-16 network,when original feature is entered into the module,two sub-modules of channel attention and spatial attention in CBAM can adjust the original feature and ignore meaningless features.So that the retained features pay more attention to the target object in the image,thereby improving the accuracy of the image captioning model.(2)Exploring use Gated Recurrent Unit(GRU)to optimize the sequence data processing part of the image captioning model.The update gate and reset gate in GRU can control the selection process of sequence data,thereby the problem that insufficient long-term memory of generated text caused by the gradient dispersion in normal RNN is solved to some extent.Then combining the image features with sequence features to realize the generation of image description sentences.(3)Experimenting by the dataset MSCOCO,using perplexity,BLEU,METEOR,CIDEr and subjective evaluation to compare our method with other methods.In order to verify the effect of CBAM attention module,using Grad-CAM visualization method to compare the results of Vgg-16 and Vgg-16+CBAM.Experiments verify the effectiveness of the method in this dissertation,and prove that the method has improved the performance in image description generation.
Keywords/Search Tags:Image Captioning, Convolutional Neural Network, Recurrent Neural Network, Gated Recurrent Unit, Attention Mechanism
PDF Full Text Request
Related items