Font Size: a A A

Research On Visual Question Answer Algorithm Based On Attention Mechanism

Posted on:2021-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:2428330605456702Subject:Electronic information technology and instrumentation
Abstract/Summary:PDF Full Text Request
Teaching the computer to handle logical reasoning tasks is the ultimate goal of Artificial Intelligence.Visual Question Answering(VQA)is one of the most important fields.VQA integrates Computer Vision(CV)and Natural Language Processing(NLP),which is aimed to eliminate the semantic gap between different modalities information.Its task is described as:Input an image I and a question Q,through the model pre-trained by Deep Learning,output the correct answer A.A typical VQA algorithm can be divided into three modules:Image codec,Question codec and Multimodal Information Fusion.In this paper,a VQA algorithm based on Attention Mechanism is proposed after the exploration of relevant theories and techniques.This algorithm is based on the Pytorch framework.According to the modular idea,on the one hand,it uses VGG16 to extract image space vectors,and overlaps the soft attention mechanism to obtain image space feature vectors more efficiently and accurately.On the other hand,it uses LSTM to encode text space vector,which may retains the semantic information to the greatest extent.Finally,the multi-modal space vector fusion is carried out by the block-based aggregation method,and the output of the answer is dealt with as a classification problemIn order to verify the effectiveness and generality of the algorithm,this paper conducts experiments from multiple dimensions such as different models,different data sets,and different scenarios.Experimental results show that the proposed algorithm model achieves an accuracy of about 71.17%on VQA and VG data sets,and about 83.89%on binary classification task.At the same time,it performs well in abstract scenes and Chinese VQA,and has outstanding performance in training time and return efficiency.In addition,this paper also successfully carried out the PC demo,which provides a kind of exploration for the transformation of industry-university-research.
Keywords/Search Tags:Visual Question Answering, Multimodal Information Fusion, Attention Mechanism, Deep Learning
PDF Full Text Request
Related items