Research On Image Captioning Algorithms Based On Deep Learning

Posted on:2021-05-22

Degree:Master

Type:Thesis

Country:China

Candidate:J C Hu

Full Text:PDF

GTID:2428330614960194

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Image captioning task is committed to giving the computer the ability to "talking about pictures",that is,under the condition of given input pictures,it can automatically generate the text sequence that conforms to the natural language expression rules and truly reflects the image content.The task usually uses image recognition model or object detection model as feature extractor or entity detector to obtain image features for further use in captioning model.However,the existing image captioning algorithms can not make good use of the output of upstream tasks.This is often due to the introduction of attention mechanism in solving the long-distance dependence problem from sequence to sequence generation task,which leads to the "over attention" problem.Finally,the model ignores the content that is not significant in the image,resulting in the sentence generated by the captioning model missing some image details.In addition,exposure bias and label bias will be introduced when optimizing model parameters by minimizing cross entropy objective function:exposure bias refers to that the model always takes words in reference sentence in the training phase,but uses the words of generated sentences in the test phase,which will lead to error accumulation.Label bias refers to that the model always generates high-frequency scenes and high-frequency words of reference sentences in the training stage.At the same time,cross entropy loss function also leads to the lack of diversity and over correction of captioning sentences.Although the problem of exposure bias and label bias can be partly solved by introducing reinforcement learning algorithm into image captioning task,this kind of algorithm usually uses "automatic evaluation metrics"(such as Bleu,Meteor,CIDEr and Rouge)as reward value.Because these metrics are not completely related to the evaluation standards of human experts,thus leads to the phenomenon that the model only strengthens the metrics and does not improve the quality of caption sentences.In this paper,an image captioning framework with hybrid attention mechanism for reinforcement learning is proposed.The framework improves the performance of the model through two designs:hybrid attention mechanism and inverse reinforcement learning method.(1)The hybrid attention mechanism is composed of visual self attention mechanism and soft attention mechanism.The former is used to focus on the major objects in the image,and the latter is used to represent the relationship between all the detected objects.This design avoids the problem that the attention mechanism pays too much attention to a certain major object.Eventually,the output of the two attention mechanisms is concatenated as the input of the following modules.(2)The reward of model self-learning is obtained from the mapping of image features and sentence features.The reward of "evaluation metrics" is only determined by the n-gram matching degree of the sentence itself.The former can ensure the correspondence between the sentence and the image.(3)In the training stage,the generated sentences and reference sentences are mapped to Boltzmann distribution,and then the generator network is trained to solve the problems of exposure bias,label bias and overcorrection,and increase the diversity of sentences.Finally,the experimental results on Microsoft coco dataset show that the algorithm proposed in this paper has some advantages over some current algorithms in qualitative and quantitative aspects.

Keywords/Search Tags:

image captioning, deep learning, object detection, attention mechanism, inverse reinforcement learning, generative adversarial networks

PDF Full Text Request

Related items

1	Research On Image Captioning By The Method Of Generative Adversarial Networks
2	Image Caption Generation Based On Generative Adversarial Networks
3	Research And Implementation Of Image Captioning Technology Based On Deep Learning
4	Image Captioning Based On Deep Learning And Multi-Metric Reinforcement Learning
5	Research On Robotic Generation Method Of Chinese Calligraphy Strokes Based On Generative Adversarial Networks And Inverse Reinforcement Learning
6	Researches On Short Video Captioning Based On Deep Learning
7	Image Feature Understanding And Semantic Representation Based On Deep Learning
8	Research On Zero-shot Image Classification Based On Generative Adversarial Network
9	A Research Of Network Traffic Model And Anomaly Detection Technology
10	Video Captioning With Adversarial Reinforcement Learning