Font Size: a A A

Research And Implementation Of Visual Question Answering System Based On Deep Learning

Posted on:2022-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:A ChangFull Text:PDF
GTID:2518306494992099Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The development of neural networks and the increase of large-scale data sets,as well as the improvement of computer hardware computing capabilities,have made deep learning-based technologies in unimodal(image,text,voice)huge development and application.However,human advanced cognitive and reasoning functions such as multimodal understanding and interaction are still weak.In response to this problem,this article studies an extremely important research topic in the field of multimodal interaction-visual question answering(VAQ).Visual question answering involves two modal information of image and text.Due to the outstanding performance of convolutional neural network(CNN)and recurrent neural network(RNN)on images and text respectively,many models based on the combination of convolutional network and recurrent network Good results are reflected in the visual question and answer task.With the emergence of the attention mechanism,many models based on the attention mechanism have been produced,but the existing models are still not particularly ideal in terms of overall accuracy,especially in answering complex questions that require reasoning and counting.Aiming at the problem that the overall accuracy of the current model is not high,this paper proposes a visual question answering model based on layered joint attention mechanism.Then,based on the existing models to answer complex questions with low accuracy,the idea of visual reasoning was studied and a visual question answering model based on reasoning network was proposed.Experimental results show that the accuracy of the model with reasoning ability to answer complex questions is significantly higher than other existing methods.This paper focuses on the visual question answering model based on deep learning,and uses deep learning methods to process visual question answering tasks.The main research contents include: 1)In-depth study of the attention mechanism and constructing a layered joint attention that can focus on images and questions in both directions The force model uses hierarchical attention to extract the problem features multiple times,and then uses joint attention to construct the image-problem feature map to enhance the relationship between the problem and the image.Experiments show that the hierarchical joint attention model can improve the image-problem Interrelationships to improve the accuracy of results.2)Aiming at the problem that the model is not highly accurate on complex problems,building a reasoning network module based on visual reasoning allows the model to extract complex problem features to improve the model's reasoning ability.3)Use Res Net-152 to extract deep image features,build visual text joint memory memory vector under the action of question attention and visual attention,and help the model infer and predict the answer.After experimentation,the complex is based on the original data set.The prediction of the outcome of the problem has achieved good results.
Keywords/Search Tags:Visual question and answer, Deep learning, Visual reasoning, Attention Mechanism, Inference network
PDF Full Text Request
Related items