Font Size: a A A

Research And Implementation Of Image Captioning Technology Based On Deep Learning

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:J L HuFull Text:PDF
GTID:2428330623467821Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic image captioning is a technology that maps picture content into corresponding natural language description.It is well thought that automatic image captioning has many important practical applications.For example,it could help visually impaired people to better understand the environment surrounding them,even replace the work of guide dogs.Besides,it could often help young children to learn reading from pictures.Most current research on automatic image captioning algorithms is often based on the framework with Encoder-Decoder,which usually involves two key points.One is the extraction of image features,and the other is decoding the extracted visual features so as to generate sentence description.This thesis mainly improves some classic captioning models,and main work includes:(1)Aiming at the problems in attention mechanism,inappropriate image region weight assigning strategy and excessive redundant feature information,this thesis proposes an improved method based on Faster R-CNN framework to extract image target region feature.Comparative experiments show that the target region features with attribute description extracted by this method can greatly improve the annotation generation performance of image caption.(2)Traditional Attention mechanism does not consider whether Q is related to K or V.If they are not relative,it may mislead the generation results of our model.In order to solve this problem,this thesis proposes a modified Attention strategy and an optimization framework based on Multi-Head Attention and Transformer architecture.The proposed optimization framework can successfully overcome the shortcomings of the traditional attention mechanism.It could improve the performance evaluation index of image captioning model.(3)At present,most models rarely consider the potential coherence existing in attention.Under this situation,this thesis improves a two-layer Up-Down model,and then proposes a CA-LSTM model using the coherence mechanism of attention.Experiments show that the new model proposed in this thesis could generate sentence descriptions faster and more accurately than traditional Up-Down model.Fully cooperated by the potential coherence of attention,it could often significantly reduce the exposure bias,as well as other problems occurring in caption inferring.
Keywords/Search Tags:Automatic Image Captioning, Attention Mechanism, Reinforcement Learning, Long Short-Term Memory
PDF Full Text Request
Related items