Font Size: a A A

Research Progress Analysis And Key Information Measurement Of Visual Question Answering

Posted on:2022-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:S L XiaFull Text:PDF
GTID:2518306530955629Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In currently include natural language processing and the research focus in the field of computer vision,visual q&a(VQA)is undoubtedly the most popular and challenging field,has been more and more attention,and has important research significance,through the study of visual q&a,can help the visual disabilities to understand things around,answer a few questions about the scene around,To alleviate the difficulties of life,improve the experience of human-computer interaction,make human-computer interaction more natural.However,given an image,there is no quantitative research on whether the question is more important or the image is more important when answering the corresponding question.This paper will conduct a detailed discussion and research on this aspect.The main work of this paper is as follows:(1)Firstly,the current research situation is summarized,and the possible research directions in the future are pointed out.At the beginning of the research of visual question-answering,the design of visual question-answering model is mainly to use images and visual features for simple joint embedding;Then part of the research deals with the feature dimension of the image and the problem.Recently,a large number of researches have introduced the attention mechanism to make the problem focus on its own content and the content related to the image.There are now relational models based on relationships and reasoning.This paper summarizes the advantages and disadvantages of the above research directions,and analyzes the future research directions.(2)For the research of visual question-answering,this paper compares and analyzes the key information in the visual question-answering by using only the question model,the key area of the masked image,and the key words of the masked image to fuse the image features respectively,and carries out an experimental analysis on the open visual question-answering data set.The experimental results show that the question feature is more important than the image feature in the accuracy index commonly used in the visual question answering model research.
Keywords/Search Tags:Computer vision, Natural language processing, Information measurement, Visual question answering
PDF Full Text Request
Related items