Research On Visual Question Answer Algorithm Based On Attention Mechanism

Posted on:2021-02-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Cao

Full Text:PDF

GTID:2428330605456702

Subject:Electronic information technology and instrumentation

Abstract/Summary:

PDF Full Text Request

Teaching the computer to handle logical reasoning tasks is the ultimate goal of Artificial Intelligence.Visual Question Answering(VQA)is one of the most important fields.VQA integrates Computer Vision(CV)and Natural Language Processing(NLP),which is aimed to eliminate the semantic gap between different modalities information.Its task is described as:Input an image I and a question Q,through the model pre-trained by Deep Learning,output the correct answer A.A typical VQA algorithm can be divided into three modules:Image codec,Question codec and Multimodal Information Fusion.In this paper,a VQA algorithm based on Attention Mechanism is proposed after the exploration of relevant theories and techniques.This algorithm is based on the Pytorch framework.According to the modular idea,on the one hand,it uses VGG16 to extract image space vectors,and overlaps the soft attention mechanism to obtain image space feature vectors more efficiently and accurately.On the other hand,it uses LSTM to encode text space vector,which may retains the semantic information to the greatest extent.Finally,the multi-modal space vector fusion is carried out by the block-based aggregation method,and the output of the answer is dealt with as a classification problemIn order to verify the effectiveness and generality of the algorithm,this paper conducts experiments from multiple dimensions such as different models,different data sets,and different scenarios.Experimental results show that the proposed algorithm model achieves an accuracy of about 71.17%on VQA and VG data sets,and about 83.89%on binary classification task.At the same time,it performs well in abstract scenes and Chinese VQA,and has outstanding performance in training time and return efficiency.In addition,this paper also successfully carried out the PC demo,which provides a kind of exploration for the transformation of industry-university-research.

Keywords/Search Tags:

Visual Question Answering, Multimodal Information Fusion, Attention Mechanism, Deep Learning

PDF Full Text Request

Related items

1	Research On Visual Question Answer Algorithm Based On Attention Mechanism
2	Research On Multimodal Attention Mechanism And Information Fusion For Visual Question Answering
3	Research On Visual Question Answering Based On Deep Neural Network
4	Research On Visual Question Answering Method Based On Attention Mechanism And Multimodal Fusion
5	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
6	Multi-modal Information Fusion In Visual Question Answering
7	Research On Multimodal Fusion For Visual Question Answering
8	Research On Visual Question Answering Model Based On Attention Mechanism And Feature Fusio
9	Research On Collaborative Attention Model And Deep Correlated Networks For Visual Question Answer
10	Research On Multimodal Interaction Model And Optimization Method For Visual Question Answerin