Font Size: a A A

Research On Image Semantic Fine-grained Caption Method Based On Deep Learning

Posted on:2022-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y TangFull Text:PDF
GTID:2518306575966029Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image semantic caption is a task that exploits computer to analyze and understand the content and meaning of a given image,and transforming the content and meaning into text form.The task has potential for large applications,such as intelligent monitoring,intelligent image annotation and human-computer interaction,and which has become a research focus in the cross subjects of computer vision and natural language processing.However,there are still some shortcomings in the current research,such as the inadequate use of image feature information and the rough caption of content.This thesis studies the multi-feature encoding method and the decoding method of superimposed LSTM(Long Short Term Memory).Main research contents of the thesis are as follow.Firstly,aiming at the problem that the current image semantic caption task exists underutilization of the image feature information and difficultly anchors the core area of the image,an image semantic caption generation method based on multi-feature is proposed.Using VGG16 to extract image features at different scales and Faster R-CNN to extract feature of the proposed description region,and connecting the fusion of features at different scales to the attention mechanism,multi-feature is constructed.The multi-feature is input into a single LSTM to generate image semantic caption.Experiment results show that the index score of the model constructed by this method is improved compared with that of the benchmark model,and the content of the image can be captured more accurately.Secondly,aiming at the problem that image semantic caption task is difficult to generate fine-grained caption for complex images,a method of image semantic fine-grained caption generation based on a double-layer LSTM is proposed.Based on the encoding method of multi-feature information,a double-layer LSTM is as decoder to generate fine-grained caption.The comparative experiment verifies that the double-layer LSTM has better decoding ability than the single-layer LSTM,and the index score of the model constructed by this method is greatly improved compared with the benchmark model.Thirdly,based on the above research,the thesis designs and develops a prototype system to automatically generate image semantic caption with the mode of "front-end +back-end".Users can upload images,browse images and view image caption through browser.The back-end is developed based on the flash framework to process the front-end requests and infer the model.The prototype system is simple and easy to use,and the generation time is about 2.6 seconds.
Keywords/Search Tags:image semantic caption, LSTM, VGG16, Faster R-CNN
PDF Full Text Request
Related items