Font Size: a A A

Research On Image Description Algorithm Fusion With Local Semantic Information

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2428330575495220Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image description technology can convert image to text and realize cross-modal conversion of information,which is widely used in the human-computer dialogue,the search between image and text,the education of children and the life support for visually impaired people.With the advancement of communication technology,image data began to be widely distributed and disseminated on Internet.How to describe image content in natural language automatically has become a hot research topic.This paper focuses on the problem of automatic generation of image description from the methods based on emotional representation,the method based on local spatial semantic information,and the emotional analysis method based on image description.The research work of this paper mainly includes:(1)An image description method based on sentiment representation is proposed.This method is based on the encoder-decoder model,using convolutional neural network for image feature extraction,and LSTM for sentence generation.First,the existing tools are used to extract the emotional representations(including visual semantics and expressions)in the graph,and their corresponding rectangular bounding boxes.The visual semantic information and the expression information are then represented as vectors,mapped to specific dimensions as additional inputs to the LSTM and involved in training and prediction.In this way,the generated sentences are emotionally colored and improve the accuracy of the image description.The experimental results show that the method can effectively improve the accuracy of image description and make the generated sentences more emotional.(2)An image description method based on local spatial semantic information and global information is proposed.Firstly,the existing target detection model is used to extract the existing objects in the image and their corresponding rectangular bounding boxes,and then the attention model is used for each bounding box.The rectangular bounding boxes give different weights.And the input of the bidirectional grid LSTM(bi-Grid LSTM)is dynamically weighted so that it focuses on different regions at different times.The experimental results show that this method can effectively alleviate the problem that the model is easy to lose small area targets in image description,and the performance is better than the current method.(3)A method of emotional analysis of social network data based on image description is proposed.Firstly,an image description model is trained as an image feature extractor,and the generated description sentences are taken as single image convolutions as image features.The text feature vector is extracted by multi-layer convolution,and then the image feature vector is spliced with the text feature vector and passed into the fully connected layer for prediction,so that the social network data sentiment tendency is automatically recognized.Experiments have shown that this method achieves better performance than similar methods in the field.
Keywords/Search Tags:Image Description, Image Understanding, Local Semantic features, Sentiment Analysis, LSTM
PDF Full Text Request
Related items