Research On Image Captioning Algorithm Based On Deep Learning

Posted on:2021-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:S He

Full Text:PDF

GTID:2428330611980589

Subject:Electronic science and technology

Abstract/Summary:

PDF Full Text Request

With the rapid growth of data scale and computing capability,deep learning based on data and hardware begins to show unique advantages.As a challenging field of artificial intelligence,image captioning attracts more and more attention.As a comprehensive task in computer vision and natural language processing,image captioning could complete the conversion from image to text,that is,the algorithm could automatically generate corresponding descriptive sentences based on input images.Enabling computers to describe the visual world brings a wide range of applications,such as information retrieval,human-computer interaction,children education and assistance for the visually impaired.Traditional methods for image captioning include template-based method and retrieval-based method.However,these methods have certain limitations to be applied to new scenes and have poor correlations with human descriptions.According to deep learning methods,we design an image captioning model based on encoder-decoder structure.The expanded deep convolutional neural network is used as an encoder to extract image features.The long short-term memory network is used to generate descriptive sentences.This paper researches the end-to-end image captioning model.The main works are as follows:1.Improve the accuracy of deep convolutional neural network while maintaining the number of hyperparameters.The traditional way of improving the accuracy of a convolutional neural network is to increase its depth.However,as the number of hyperparameters increases,the computational cost and the difficulty of network design will increase.Inspiring by the multi-branch mechanism of Inception module,we design a highly modularized CNN based on Resnet.We keep the identity mapping connection with residual learning function.Meanwhile,the complexity of the network is simplified by stacking modules of the same topology.Without increasing the number of hyperparameters,the multi-branch convolutional neural network is easier to optimize and generalize.2.Extracting image features using convolutional neural network with expanded branches is proposed to further optimize the image captioning model.As an encoder for processing image features,the multi-branch convolutional neural network is pre-trained.A special recurrent neural network LSTM with a memory unit is used as a decoder to train an end-to-end image captioning model.Calculate the scores of the proposed model on the automatic evaluation metrics BLEU,METEOR and CIDEr and compare with other image captioning algorithms.Experimental results on large-scale datasets Flickr8 k,Flickr30k and MSCOCO show that the proposed method could improve the performance of image captioning algorithms effectively.

Keywords/Search Tags:

image captioning, deep learning, multi-branch convolutional neural network, long short-term memory network, encoder-decoder framework

PDF Full Text Request

Related items

1	Visual Data Understanding Based On Deep Encoder-Decoder Framework
2	Research On Image Captioning Algorithm Based On Deep Learning
3	The Cross-site Script Detection Based On Deep Learning
4	Taxi Passenger Demands Prediction Based On Deep Learning Approach
5	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion
6	Research And Application Of Web Malicious Generation Detection Technology Based On Deep Learning
7	Image Caption Model Based On Feature Extraction Via Dense Convolutional Neural Network
8	Collaborating General And Specific Semantics For Multi-feature Based Image Captioning
9	Image Captioning Based On Attention Long Short-Term Memory Network
10	Research On Network Intrusion Detection Method Based On Bi-LSTM