| Visual Question Answering is a challenging frontier research direction,the task is to generate and output natural language answers by taking images and image-related natural language questions as input,infering the semantic meaning of the questions and the content of the pictures.The existing VQA model is not accurate enough to judge the word in question,the relationship between the objects in the image and the relationship between the problem and the image,and its reasoning ability also needs to be improved.To solve these problems,a visual question and answer reasoning model based on graph neural network is proposed.Firstly,in order to solve the problem of weak word relevance,a problem encoder is constructed,which takes continuous words as input and feeds them into BERT network based on self-attention mechanism to form problem features,so that word relevance within the problem can be paid attention to.Secondly,aiming at the problem of weak compactness of image content,the network model based on Faster-RCNN is adopted to solve the problem of region discontinuity.The target features are extracted from the graph,and the object regions with high relevance and score are selected according to the proposed questions.Finally,in order to improve the connection and reasoning ability between image problems,the filtered features are regarded as the initial nodes of the graph.Combining the questions and background information,the nodes are updated through the graph attention network,so that the improved nodes can learn the relationship between objects more completely and accurately.The final node passes through multi-layer perceptron and activation function to complete the answer prediction.In the data set of CLEVR and GQA,most evaluation indexes of the model were better than other models.Compared with the model with the second highest performance in experimental comparison,the overall accuracy rate increased by 0.10% and 2.59%.Compared with the baseline model,the increase of 46.70 and 27.72 percentage points respectively proves the effectiveness of the proposed method. |