Research On Image Caption Based On Deep Learning

Posted on:2020-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:M G Zhu

Full Text:PDF

GTID:2428330575976059

Subject:Computer Science and Technology

Abstract/Summary:

Image caption plays an important role in image retrieval and helping blind people understand image content,especially in improving the quality of life of blind people.At present,the research method of image caption based on deep learning is effective.The current research results have problems of computational resource waste and sub-optimization of results,image features and text features can not be fully utilized,this paper studies image caption for these problems.In order to obtain more targeted image caption,the visual question answering method is studied.The main work is as follows:1.Faster R-CNN object detection algorithm is used to locate image object coordinates and extract object region features.The data and process of object detection using Faster R-CNN are introduced.In order to obtain high-quality image object features,the structure of Faster R-CNN is improved,and the model is added to recognize object attributes.Using the improved Faster R-CNN algorithm to obtain the image object region features,applied to image caption research.In order to make full use of image features and text features,a joint LSTM(J-LSTM)structure based on attention mechanism is proposed.Experiments show that the method used in this paper have improved the scores of the evaluation indicators such as BLEU,METEOR and CIDEr than the mainstream research methods,it is confirmed that the object region features and J-LSTM structure based on attention mechanism used in this study are effective for the improvement of the model.2.In order to obtain more targeted image caption,which can generate captions according to user's questions,visual question answering algorithm is studied on the basis of image object features.A visual question answering algorithm based on image object feature and double attention(D-A algorithm)is proposed.The algorithm first extracts the image object feature,then uses the LSTM network to process the text information,and finally uses the D-A algorithm to combine image features and text features to get the answer by classification.The analysis shows that the D-A algorithm can accurately locate the image object,and the answer has a high correlation with the image region.Experiments show that the method can effectively improve the accuracy of visual question answering.

Keywords/Search Tags:

deep learning, image caption, visual question answer, neural network

Related items

1	Research On Visual Question Answering Method Based On Answer Mask
2	Research And Implementation Of Visual Question Answering System Based On Deep Learning
3	Research On Visual Description Technology Based On Deep Learning
4	Research On Visual Question And Answer Method Based On Supervised Learning
5	Research On Situational Reasoning Question Answer Method Based On Deep Learning
6	Image Caption Method Based On Deep Learning
7	Research On Visual Question Answering Algorithm Based On Image Description And Multi-level Attention Mechanism
8	Research On Image Caption Generation Method Based On Deep Learning
9	Research And Implementation Of Key Technologies Of Image Caption Based On Deep Learning
10	Design Of Intelligent Question And Answer System Based On Deep Learning