Font Size: a A A

Research On Image Caption Based On Object-Attention Model

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ZhaoFull Text:PDF
GTID:2428330623481249Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Image description refers to the process of inputting a picture into a computer,and the computer generating a text sentence corresponding to the description of the picture.Image description has a wide range of applications in reality,such as aerospace,education,autonomous driving,automatic labeling of goods,and search engines.Therefore,the research of image description has broad and far-reaching significance.The method based on attention mechanism is widely used in the field of image description.Attention-based image description method has the advantages of easy training,less parameter,automatic operation,high accuracy and rich features.However,the image description attention mechanism method has the disadvantages of weak image correlation features,shallow feature extraction and weak text-image correlation.These factors affect the output accuracy of the model.The thesis modifies from these aspects to improve the prediction accuracy of image description.(1)In view of the shortcomings of the feature correlation in the attention model and the inability to effectively combine image text features,the thesis proposes feature selection network model.The feature selection network model adds a mask to the lower-level feature map for screening,which effectively overcomes the disadvantage of weak correlation between features.The experimental results show that with feature selection network model,the model accuracy is improved by 0.1 on the original basis.It spends more than 30 hours.The prediction speed is 75 frames each second.The model accuracy is improved by 0.1 on the original metric.(2)As the problem that the attention model extracts shallow features,the thesis proposes object attention model.The thesis takes the product of the activation function and the classification features as a mask,and multiplies the mask by the classification features.The whole process constructs the object attention model and it effectively overcomes the problem that attention model extracts shallow features.(3)Aiming at the problem of weak correlation between input text and images in traditional models,the thesis proposes a fusion layer network on the object attention model.The fusion layer network fuses text and images,by making the fusion result as the input of object attention model.It overcomes the shortcomings of weak correlation between text and image.It overcomes the shortcomings of weak correlation between text and image.Experiments show that the model combining the fusion layer and the attention structure of the object spends about 48 h and the prediction speed is 55 frames each second.The data set used in the experiment includes Flickr8 k,Flickr30K and COCO2014.The Flickr dataset contains about 40000 images.COCO contains about 80000 training images,40000 test images,and 40000 verification images.The experimental evaluation method adopts BLEU method and METEOR method.The work in this thesis is divided into two parts.Firstly,the traditional attention model is improved,and feature selection network is proposed.Second,this thesis proposes an object attention model and fusion layer.It can be seen from the comparison that the feature selection does help to improve the model effect,and the deeper object features have a large influence on the image description results.This provides a new perspective for the subsequent development in the field of image description.
Keywords/Search Tags:feature selection, object attention feature, gated neural network, image caption, bidirectional long short term memory
PDF Full Text Request
Related items