Research On Image Captioning Based On Self-Attention And Encoder-Decoder

Posted on:2023-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Guo

Full Text:PDF

GTID:2568306914977129

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the enhancement of computer computing power and the rapid growth of data volume,research on deep learning algorithms has made great progress,and remarkable achievements have been made in related fields such as computer vision,natural language processing,speech recognition and multimodal.At the same time,its technology is also used in every corner of society.However,there is often more than one mode of data in real application scenarios,and the importance of multi-mode correlation technology is more and more valued by human beings.Among them,the task of image description under multi-mode mainly refers to transform the image into a text to describe its content,which includes feature representation,feature fusion,feature alignment and transformation in multi-mode technology.In the process of image conversion into descriptive text,the sentence structure and sentence content of the generated text are carried out simultaneously,which is difficult.At the same time,in the process of text generation,in addition to describing text accuracy,text controllability is still difficult to solve.Aiming at image description task,this paper studies the redundancy of sentence structure,semantic deviation and controllability of generated text in the synchronous generation of sentence pattern semantics in image description task.The experimental model in this paper takes encoding and decoding as the framework and integrates self-attention mechanism.The main work of this paper is divided into two aspects:1.Aiming at the complexity of synchronous sentence pattern and semantic generation in the image description task,a visual guidance attention method based on reediting framework is proposed.The method using visual guide attention as the descriptive text for the previous model generation with the knowledge of the noise,to describe the relationship between text and image area feature mapping model,the core idea of whether the existing text correction of editing framework to reduce the difficulty of the model to study again,to correct outages and sentence structure and entity semantic deviation problem.2.An approach based on global information flow is proposed to solve the problem of content body offset and text controllability in image description task.The method uses the description text constantly generated by the model,fuses and dynamically iterates the global vector,uses the dynamic global feature to ensure the correct flow of the description image area,and controls the image subjectivity of the generated text.Experiments show that this method can effectively improve the accuracy of text generation.

Keywords/Search Tags:

image captioning, codec framework, self-attention mechanism, text generation control

PDF Full Text Request

Related items

1	Image-text Translation Based On Cross-modal Related Semantics And Attention Mechanism
2	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
3	Image Captioning Based On Generative Adversarial Network
4	Image Captioning Based On Adaptive Visual Attention Mechanism
5	Research On Text Title Generation Method Based On Eye Movement Attention Mechanism
6	Research On Image Captioning Algorithm Based On Attention Mechanism
7	Image Captioning Technology Based On Joint Attention Mechanism
8	Research On Text Guided Image Generation Method Based On Attention Mechanism
9	Research On Semantic Consistency In Text-to-Image Generation
10	Image Captioning Based On Deep Recurrent Convlution Network And Spatio-temporal Information Fusion