Font Size: a A A

Research On Image Caption Based On Deep Learning

Posted on:2020-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:M G ZhuFull Text:PDF
GTID:2428330575976059Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image caption plays an important role in image retrieval and helping blind people understand image content,especially in improving the quality of life of blind people.At present,the research method of image caption based on deep learning is effective.The current research results have problems of computational resource waste and sub-optimization of results,image features and text features can not be fully utilized,this paper studies image caption for these problems.In order to obtain more targeted image caption,the visual question answering method is studied.The main work is as follows:1.Faster R-CNN object detection algorithm is used to locate image object coordinates and extract object region features.The data and process of object detection using Faster R-CNN are introduced.In order to obtain high-quality image object features,the structure of Faster R-CNN is improved,and the model is added to recognize object attributes.Using the improved Faster R-CNN algorithm to obtain the image object region features,applied to image caption research.In order to make full use of image features and text features,a joint LSTM(J-LSTM)structure based on attention mechanism is proposed.Experiments show that the method used in this paper have improved the scores of the evaluation indicators such as BLEU,METEOR and CIDEr than the mainstream research methods,it is confirmed that the object region features and J-LSTM structure based on attention mechanism used in this study are effective for the improvement of the model.2.In order to obtain more targeted image caption,which can generate captions according to user's questions,visual question answering algorithm is studied on the basis of image object features.A visual question answering algorithm based on image object feature and double attention(D-A algorithm)is proposed.The algorithm first extracts the image object feature,then uses the LSTM network to process the text information,and finally uses the D-A algorithm to combine image features and text features to get the answer by classification.The analysis shows that the D-A algorithm can accurately locate the image object,and the answer has a high correlation with the image region.Experiments show that the method can effectively improve the accuracy of visual question answering.
Keywords/Search Tags:deep learning, image caption, visual question answer, neural network
PDF Full Text Request
Related items