Font Size: a A A

Research On Visual Question Answering Method And System Based On Deep Learning

Posted on:2020-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y M DingFull Text:PDF
GTID:2428330596475183Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Visual Question Answering(VQA)is an interdiscipline of computer vision(CV)and natural language processing(NLP).The answer to the question could be provided by the VQA system based on the image information.The VQA system contains several subtasks,such as feature extraction,attention mechanism,multi-modality feature fusion and answer generation.Recently,the researches focus on the feature optimizing,the improvement of the attention mechanism and of the multi-modality feature fusion.The difficulties include that the image features could not represent the thorough information of the pictures as well as questions,and that the image features would introduce the noise and the redundant information.For the problems of features representations,the multi-scale feature extraction and fusion method is employed in our study.In the usual method,the image features are extracted by the convolutional neural networks,whose features include advanced semantic information but less detail information about the image,which have poor performances in the VQA system.Aiming at the problem of feature representation,our study extracts the multi-scale features of the image from different layers in the pretrained neural network,and try to find out the optimal combination of the multi-scale features.As for sentences representation,our study employ the multi-scale features method to extract and fusion the features in word level,phrase level and sentence level.To solve the problems of the noise introduction and redundant information in the image representation,the method of modified blending attention mechanism is proposed in our study.The attention mechanism combines the space attention method and the channel attention method,and employ several image features to compensate the wastage of the original features caused by the attention mechanism.As for the attention mechanism for text features,the self-attention mechanism is introduced in out research,the interior structures of the sentence feature are intensified to improve the VQA system.In the end,the VQA system is modified by the combination of the multi-scale feature method as well as the modified blending attention mechanism.The performance of the modified VQA system is improved through our modification.
Keywords/Search Tags:visual question answering(VQA), multi-scale feature extraction and fusion, attention mechanism, multi-modality feature fusion
PDF Full Text Request
Related items