Visual Semantic Understanding And Question Answering Research Based On Knowledge Graph

Posted on:2024-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Zhang

Full Text:PDF

GTID:2568307079476354

Subject:Electronic information

Abstract/Summary:

Visual semantic understanding and question answering tasks are multimodal tasks that require the model to be able to deeply analyze the input image information and text information,and answer the corresponding information based on this.In visual semantic understanding and question answering models,three related techniques are mainly required: image feature extraction,text feature extraction,and multimodal feature fusion.Today,visual semantic understanding and question answering tasks face many challenges,the most important of which is how to effectively process visual image features and natural language text features to obtain fusion features with deep semantic information.To achieve this,the model must be able to extract object attributes or associations between objects in the image,and analyze the entity properties and relationships contained in the problem.However,in many cases,it is difficult to obtain the correct answer by simple reasoning based on the fusion characteristics of images and text,and the introduction of external knowledge can assist visual semantic understanding and question answering models for effective reasoning.However,because these models often only meet end-to-end training,their scalability and operability are often not high enough.In addition,since most models are end-to-end training and have poor scalability,it is necessary to adopt more flexible methods to improve the scalability and operability of the model in practical applications.Based on the knowledge graph,this paper conducts experiments and research from three aspects: image text feature enhancement,introduction of external knowledge,and modular call,and has made certain progress:(1)A visual semantic understanding and question answering model FVQA based on knowledge graph enhancement features is proposed.This model enhances the high-level semantic information of image features through the global semantic pool,and strengthens the semantic association of image features with the help of knowledge graph and image attention.Enhance high-level semantics in text features through pre-trained knowledge graph embedding coding? Multimodal attention mechanism is used to assist multi-source feature fusion.This paper proves through experiments that enhancing the advanced semantic information in images or text through knowledge graph is indeed helpful for visual semantic understanding and the completion of question answering tasks.(2)A visual semantic understanding and question answering model KVQA based on knowledge graph knowledge update is proposed.This paper updates the knowledge of the multimodal knowledge graph constructed by us with the help of external knowledge base,which can build an efficient and accurate semantic web and give the model the ability to reason and judge similar to the human brain.In addition,this model uses the general detection module to detect the corresponding category in the picture,and selects an efficient and accurate special module for further processing according to the target association discrimination module.The general module can be used stably,and the training corresponding special question answering module and knowledge update can complete different visual semantic understanding and question answering tasks,which increases the scalability of the model.(3)For the good effect of KVQA in answering reasoning questions,this paper proposes to classify the problems in visual semantic understanding and question answering tasks,adopt different methods and routes to deal with different types of problems,and propose a SKQA model assisted by knowledge graph.Not only the visual semantic understanding and the accuracy of question answering are improved,but the computational efficiency is also further improved.

Keywords/Search Tags:

Image features, text features, multimodal fusion, knowledge graph, knowledge update

Related items

1	Extracting High-level Multimodal Features
2	Research And System Implmentation Of Key Technologies For Multimodal Knowledge Graph Construction
3	Research On Recommendation Method Based On Knowledge Graph And Enhanced Features
4	Research On Video Memorability Prediction Based On Multimodal Feature Fusion
5	Research On Evaluation And Verification Of Multimodal Knowledge
6	Research On Personalized Recommendation Method Based On Knowledge Graph
7	Research On Entity Relationship Extraction Technology Based On Deep Learning Fusion Of Text Features
8	Research On Key Technologies Of Knowledge Graph Representation Learning For Multi-dimensional Features
9	Research On Construction And Application Of Multimodal Curriculum Knowledge Graph
10	Research On Key Technologies Of Knowledge Graph Costruction For The Knowledge Field Of Ship