Research On Visual Question Answering Technology Based On Knowledge Graph

Posted on:2022-10-03

Degree:Master

Type:Thesis

Country:China

Candidate:J F Li

Full Text:PDF

GTID:2518306335458414

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

The task of visual question answering topic requires the model to understand the input image and text question content,and then give the corresponding answer.Unlike the message question answering task that only needs to process single modal information,visual question answering needs to perform multi-modal information fusion processing on the information of the visual modal and the text modal.Such a task is more in line with the real scene of humans facing the problem.It is close to the artificial intelligence form with reasoning ability,has high research value and has broad application scenarios in the fields of medical auxiliary equipment,security,and early childhood education.At present,the visual question answering task also faces the following problems and challenges: When the model faces the input of two different modal information from image and text language,how to efficiently process multi-modal information and obtain accurate visual image feature representation and natural Language text feature representation or image text feature joint representation presents challenges;high-dimensional image features and text features in the semantic alignment of image text and how the model extracts the corresponding object attributes or object relationship features in the image according to the text problem and performs reasoning These problems hinder the further development of visual question answering tasks.In response to the above problems,this thesis proposes an improvement plan for the visual question answering model by simulating the human perception and cognitive reasoning process when facing real-world problems.The main research contents are as follows:(1)This thesis constructs an image-related knowledge graph through the annotation data in the data set and extracts the objects,attributes and object relationships in the images of the data set,and combines the different semantic similarity calculation methods in Word Net to design the entities in the above-mentioned knowledge graph Relationship weight.A visual question answering framework based on knowledge graph feature embedding and attention enhancement is proposed.The structured knowledge feature of image scene in the knowledge graph is combined with text problem feature and image feature,which effectively solves the problem of image text semantic alignment.(2)This thesis proposes a visual question answering framework based on cross-modal pre-training and knowledge map feature alignment,by introducing Transformer structure to encode image modal and text modal information,and designing knowledge map entity prediction,knowledge map relationship prediction,and knowledge Multiple pre-training tasks such as map attribute prediction,image ROI region mask category prediction,image text matching judgment,etc.allow the model to learn the combined features of images,texts,and knowledge maps,effectively solving multi-modal feature fusion and finer-grained image text semantics Characteristic issues.Experimental results show that adding knowledge graph features containing image scene information to the visual question answering model or framework can significantly promote the performance of the visual question answering task.

Keywords/Search Tags:

Visual question answering, Knowledge graph, Image understanding, Multi-modal fusion

PDF Full Text Request

Related items

1	Research On Visual Question Answering Based On Deep Learning
2	Research On Question Understanding Method Of Knowledge Graph Question Answering System
3	Design And Implementation Of Visual Question Answering System Based On Knowledge Graph
4	Research On Visual Question Answering Based On Text Semantic Understanding
5	Question Understanding Based On Graph Matching In Question Answering Over Knowledge Base
6	Multi-relation Question Answering Based On Knowledge Graph Subgraph Fusion
7	Research On Visual Question Answering Method With Visual Content Understanding And Text Information Analysis
8	Multi-modal Information Fusion In Visual Question Answering
9	Research And Implementation Of Question Answering System Based On Semantic Understanding
10	Semantic Describing And Understanding For Imagery Content