Font Size: a A A

Research On Visual Question Answering Method Based On Scene Word Analysis

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:S L JiangFull Text:PDF
GTID:2518306494491634Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep learning has achieved great success in both computer vision and natural language processing.Visual Question Answering(VQA),as a reference across the two major areas of computer vision and natural language processing,refers to a sentence question and a corresponding picture.It requires an intelligent system to understand the picture,and on this basis Answer.With the explosive growth of data volume,as one of the applications of visual question and answer,how to find the corresponding picture in an atlas according to the question,and then give the answer according to the picture is a difficult problem that needs to be solved urgently.For example,after the user gives a question,the intelligent system must first understand the question,and then correctly find the picture corresponding to the question in the user's photo album,and answer the question based on this picture.Existing methods need to extract the features of a given sentence and all pictures,then represent two different modalities in a common vector space,and then perform embedding and fusion,and obtain correct results through deep learning model analysis.Since the existing methods do not perform semantic and scene understanding,they directly match and answer questions,so the results will be biased and it is difficult to capture the true meaning of the question.The correct answer cannot be given even with multiple answers.Aiming at the problems of existing methods,this paper proposes a sentence similarity matching model based on feedback.Based on the feedback of the first answer,this article re-analyzes the scene words in the question if the answer is wrong,and selects potentially possible matching pictures by analyzing whether all the targets in the picture may appear in the scene.Probability analysis is then used to give a suitable answer to improve the accuracy of the overall answer.This article compares the Memex QA data set and the Visual7w data set with other methods.On the Memex QA data set,the accuracy of the method in this paper is about20% higher than the existing method.In the experiment of Visual7w data set,the method in this paper is also better than the existing method.Through the above efforts,VQA can achieve better answer results in the presence of scene words,and improve answer accuracy.
Keywords/Search Tags:Visual question answering, Computer vision, Natural language processing, Scene words, Similarity matching
PDF Full Text Request
Related items