Research And Implementation Of Image Captioning Technology Based On Deep Learning

Posted on:2021-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:J L Hu

Full Text:PDF

GTID:2428330623467821

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Automatic image captioning is a technology that maps picture content into corresponding natural language description.It is well thought that automatic image captioning has many important practical applications.For example,it could help visually impaired people to better understand the environment surrounding them,even replace the work of guide dogs.Besides,it could often help young children to learn reading from pictures.Most current research on automatic image captioning algorithms is often based on the framework with Encoder-Decoder,which usually involves two key points.One is the extraction of image features,and the other is decoding the extracted visual features so as to generate sentence description.This thesis mainly improves some classic captioning models,and main work includes:(1)Aiming at the problems in attention mechanism,inappropriate image region weight assigning strategy and excessive redundant feature information,this thesis proposes an improved method based on Faster R-CNN framework to extract image target region feature.Comparative experiments show that the target region features with attribute description extracted by this method can greatly improve the annotation generation performance of image caption.(2)Traditional Attention mechanism does not consider whether Q is related to K or V.If they are not relative,it may mislead the generation results of our model.In order to solve this problem,this thesis proposes a modified Attention strategy and an optimization framework based on Multi-Head Attention and Transformer architecture.The proposed optimization framework can successfully overcome the shortcomings of the traditional attention mechanism.It could improve the performance evaluation index of image captioning model.(3)At present,most models rarely consider the potential coherence existing in attention.Under this situation,this thesis improves a two-layer Up-Down model,and then proposes a CA-LSTM model using the coherence mechanism of attention.Experiments show that the new model proposed in this thesis could generate sentence descriptions faster and more accurately than traditional Up-Down model.Fully cooperated by the potential coherence of attention,it could often significantly reduce the exposure bias,as well as other problems occurring in caption inferring.

Keywords/Search Tags:

Automatic Image Captioning, Attention Mechanism, Reinforcement Learning, Long Short-Term Memory

PDF Full Text Request

Related items

1	Image Captioning Based On Attention Long Short-Term Memory Network
2	Research On Image Captioning Algorithm Based On Deep Learning
3	Image Captioning Based On Adaptive Visual Attention Mechanism
4	Research On Image Caption Method Based On High Level Semantic Extraction And Attention Mechanism
5	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
6	Research On Intelligent Semantics Generation For Visual Data
7	Research On Image Captioning Method Based On Deep Neural Networks And Adaptive Attention Mechanism
8	Research On Image Caption Via Incorporating Attention And Long Short-Term Memory Network
9	Study On Image Captioning Based On Spatial Topological Relationship
10	Researches On Short Video Captioning Based On Deep Learning