Font Size: a A A

Research Of Image Automatically Caption Algorithm Based On Deep Learning

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:J D YangFull Text:PDF
GTID:2428330596476538Subject:Engineering
Abstract/Summary:PDF Full Text Request
The goal of image caption is to generate text that describes the content of the image.This task is an interdisciplinary subject that combines computer vision and natural language processing.Image caption has broad application in the fields of intelligent human-computer interaction,visual aid,image recognition,etc.which can bring great convenience to human life.Image caption needs to convert image into text.Firstly,it involves the extraction of image feature.The feature representation of high-quality image information is the premise of good results for this task.Then,it is text generation technology.After extracting image features,how to transform the information in image features into grammatically correct,semantically correct,and readable natural language is also a big problem.The traditional image caption researches focus on the machine learning,template-based and retrieval methods.However,these methods are complicated to extract the features of the image,and require a lot of manpower and material resources,and also,the quality of the converted text is not good.In recent years,the application of Encoder-Decoder framework based on deep learning for image caption task has become very popular and has achieved a lot of good results,but the research still suffers from many problems.This paper mainly studies and explores the image caption algorithm based on Encoder-Decoder framework,and the main work and contributions of this paper are as follows:1.Constructed a convolutional neural network in encoding process to extract image features.Firstly,considering the good performance of convolutional neural network in image feature extraction task,this paper constructs a 5-layer convolutional neural network without fully connected layer to encode the image features of each position in the image.2.An attention mechanism strategy is proposed to select image features of different locations at different time.After the convolutional neural network constructed in the paper,an attention mechanism strategy is designed.This strategy is used to select the image features of the right position,which have the greatest impact when generating word at different time,and input the selected features into the decoding process.3.Improved the standard LSTM model for text generation in decoding process.In the text generation stage,the LSTM model is still selected,but the standard LSTM unit only has the last generated word's information,and there is a problem that the above information is incompletely understood.The improved LSTM model inputs the information of all words that are already generated into the LSTM unit at the time t,so that each LSTM unit has more completely context information.In this paper,the proposed models and methods were integrated into the Encoder-Decoder framework to form a complete image caption system,and did some experiments on the MSCOCO dataset.Compared with other algorithms and models,the methods and models generated in this paper have certain improvement in semantic accuracy,grammatical normality and efficiency,and have achieved some good results.
Keywords/Search Tags:Deep Learning, Image Caption, Attention Mechanism, Image Feature Extraction, Natural Language Processing
PDF Full Text Request
Related items