Font Size: a A A

Research On Visual Question Answering Method With Visual Content Understanding And Text Information Analysis

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ZhangFull Text:PDF
GTID:2428330623967786Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and machine learning,computer vision tasks and natural language processing tasks have attracted more and more researchers' attention,especially in the field of combining images and words.Among them,Visual Question Answering(VQA)is one of the compelling research areas.As far as computer vision tasks are concerned,research on traditional vision tasks such as object detection and image segmentation has surpassed the human eye's ability to recognize them.These research directions have reached the limit.As far as natural language processing tasks are concerned,traditional tasks such as language translation in this field have also reached human cognitive standards,and development has been limited by bottlenecks.Therefore,researchers have shifted their research focus from the traditional research direction to a combination of image and text research,and the visual question answering system is one of the directions that attracts more and more researchers' attention.Visual Question Answering(VQA)which involves understanding an image and paired questions develops very quickly with the boost of deep learning in relevant research fields such as natural language processing and computer vision.Existing works highly rely on the knowledge of dataset.However,some questions require more professional cues other than the dataset knowledge to answer questions correctly.To address such issue,we propose a novel framework named Knowledge-based Augmentation Network(KAN)for VQA.We introduce object related open-domain knowledge to assist the question answering.Concretely,we extract more visual information from images,and introduce knowledge graph to provide necessary common sense or experience for reasoning process.For these two augmented inputs,we design an attention module that can adjust itself according to the specific questions,such that the importance of external knowledge against detected objects can be balanced adaptively.Extensive experiments on two challenging visual question answering datasets VQA v2 and VQA-CP v2 show that our model achieves the state-of-the-art performance.In addition,our open-domain knowledge is also beneficial to the VQA baselines about question on knowledge.
Keywords/Search Tags:visual question answering, computer vision, natural language processing, knowledge base, attention mechanism
PDF Full Text Request
Related items