Font Size: a A A

Research On Situational Reasoning Question Answer Method Based On Deep Learning

Posted on:2022-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z N QiuFull Text:PDF
GTID:2518306509467084Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of 5G and the construction of smart cities,people have a higher demand for intelligent human computer interaction.As an important part of human computer interaction,visual question answering based on scene understanding and relational reasoning has become an important research direction.Visual question answering integrates computer vision,natural language understanding and knowledge repressentation and reasoning.Based on the idea of deep learning,images and imagerelated problems are taken as the input of convolutional neural network,the image and natural language are converted into machine language.In order to facilitate reading,the output are converted to natural language.So the visual question answering system has two main tasks: first,questions posed in natural language need to be understood by the network system.Second,image features are extracted for recognition and detection,so as to understand the image and answer the question.Traditional visual question answering research methods only focus on the number and classification of objects in the image,lack of necessary reasoning on the relationship between objects in the scene,resulting in a low understanding of the picture.In this paper,we propose to apply relational reasoning on visual question answering task to better understand pictures.At the same time,attention mechanism is introduced to construct two visual question answering networks based on relational reasoning.In this paper,a visual question answer network based on object detection and relational reasoning is constructed.Under the premise of accurate target detection,the relationship between targets is inferred to improve the accuracy of answer.Faster Regional Convolutional Neural Network is used to detect object and extract feature on the image and Long and Short Memory Network is used to extract the information of the questions.The multi-mode feature fusion inference network is composed of multi-layer sensing units,which fuse and reason the target feature block and the problem feature,and then get the answer.The whole experiment process uses the dataset CLEVR for training and testing,which has higher accuracy than other models.In this paper,a graph convolutional neural network based on the attention mechanism is constructed.The relationship between objects in the image is represented by graph structure,and the graph network is added to the general visual question answering model,which is combined with the attention mechanism to obtain the graph attention mechanism network.In the model,the image information is extracted by the faster regional convolutional neural network,and the text information of problem is learned by the long and short memory network.After the combination of image information and text information through the graph attention network,the fusion information containing attention weight is used to learn and reason the object relations in the image,and finally the answer is made through the multilayer perceptron.The model is trained on CLEVR and GQA datasets,experimental results show that the model achieves higher accuracy.
Keywords/Search Tags:Deep learning, Visual question answering, Relational reasoning, Attention mechanism, Graph convolutional neural network
PDF Full Text Request
Related items