Font Size: a A A

Research On Image Captioning Algorithm Based On Deep Learning

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:S HeFull Text:PDF
GTID:2428330611980589Subject:Electronic science and technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of data scale and computing capability,deep learning based on data and hardware begins to show unique advantages.As a challenging field of artificial intelligence,image captioning attracts more and more attention.As a comprehensive task in computer vision and natural language processing,image captioning could complete the conversion from image to text,that is,the algorithm could automatically generate corresponding descriptive sentences based on input images.Enabling computers to describe the visual world brings a wide range of applications,such as information retrieval,human-computer interaction,children education and assistance for the visually impaired.Traditional methods for image captioning include template-based method and retrieval-based method.However,these methods have certain limitations to be applied to new scenes and have poor correlations with human descriptions.According to deep learning methods,we design an image captioning model based on encoder-decoder structure.The expanded deep convolutional neural network is used as an encoder to extract image features.The long short-term memory network is used to generate descriptive sentences.This paper researches the end-to-end image captioning model.The main works are as follows:1.Improve the accuracy of deep convolutional neural network while maintaining the number of hyperparameters.The traditional way of improving the accuracy of a convolutional neural network is to increase its depth.However,as the number of hyperparameters increases,the computational cost and the difficulty of network design will increase.Inspiring by the multi-branch mechanism of Inception module,we design a highly modularized CNN based on Resnet.We keep the identity mapping connection with residual learning function.Meanwhile,the complexity of the network is simplified by stacking modules of the same topology.Without increasing the number of hyperparameters,the multi-branch convolutional neural network is easier to optimize and generalize.2.Extracting image features using convolutional neural network with expanded branches is proposed to further optimize the image captioning model.As an encoder for processing image features,the multi-branch convolutional neural network is pre-trained.A special recurrent neural network LSTM with a memory unit is used as a decoder to train an end-to-end image captioning model.Calculate the scores of the proposed model on the automatic evaluation metrics BLEU,METEOR and CIDEr and compare with other image captioning algorithms.Experimental results on large-scale datasets Flickr8 k,Flickr30k and MSCOCO show that the proposed method could improve the performance of image captioning algorithms effectively.
Keywords/Search Tags:image captioning, deep learning, multi-branch convolutional neural network, long short-term memory network, encoder-decoder framework
PDF Full Text Request
Related items