Research On Visual Question Answering Method And System Based On Deep Learning

Posted on:2020-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Ding

Full Text:PDF

GTID:2428330596475183

Subject:Control Science and Engineering

Abstract/Summary:

Visual Question Answering(VQA)is an interdiscipline of computer vision(CV)and natural language processing(NLP).The answer to the question could be provided by the VQA system based on the image information.The VQA system contains several subtasks,such as feature extraction,attention mechanism,multi-modality feature fusion and answer generation.Recently,the researches focus on the feature optimizing,the improvement of the attention mechanism and of the multi-modality feature fusion.The difficulties include that the image features could not represent the thorough information of the pictures as well as questions,and that the image features would introduce the noise and the redundant information.For the problems of features representations,the multi-scale feature extraction and fusion method is employed in our study.In the usual method,the image features are extracted by the convolutional neural networks,whose features include advanced semantic information but less detail information about the image,which have poor performances in the VQA system.Aiming at the problem of feature representation,our study extracts the multi-scale features of the image from different layers in the pretrained neural network,and try to find out the optimal combination of the multi-scale features.As for sentences representation,our study employ the multi-scale features method to extract and fusion the features in word level,phrase level and sentence level.To solve the problems of the noise introduction and redundant information in the image representation,the method of modified blending attention mechanism is proposed in our study.The attention mechanism combines the space attention method and the channel attention method,and employ several image features to compensate the wastage of the original features caused by the attention mechanism.As for the attention mechanism for text features,the self-attention mechanism is introduced in out research,the interior structures of the sentence feature are intensified to improve the VQA system.In the end,the VQA system is modified by the combination of the multi-scale feature method as well as the modified blending attention mechanism.The performance of the modified VQA system is improved through our modification.

Keywords/Search Tags:

visual question answering(VQA), multi-scale feature extraction and fusion, attention mechanism, multi-modality feature fusion

Related items

1	Research On Visual Question Answering Algorithm Based On Feature Fusion Of Attention Mechanism
2	Research On Visual Question Answering Based On Multiple Attention Mechanism And Feature Fusion Algorithm
3	Visual Question And Answering Based On Two-dimensional Multi-attention Feature Fusion
4	Research On Visual Question Answering Model Based On Attention Mechanism And Feature Fusio
5	Research On Visual Question Answering System Based On Image Attention
6	Research On Multimodal Fusion For Visual Question Answering
7	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
8	Research And Application Of Multi-domain Visual Question Answering System Based On Image Comprehension
9	Research On Visual Question Answering Based On Deep Neural Network
10	Human Action Recognition Based On Attention Mechanism And Multi-Modality Feature Fusion