Font Size: a A A

Visual Semantic Understanding And Question Answering Research Based On Knowledge Graph

Posted on:2024-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2568307079476354Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Visual semantic understanding and question answering tasks are multimodal tasks that require the model to be able to deeply analyze the input image information and text information,and answer the corresponding information based on this.In visual semantic understanding and question answering models,three related techniques are mainly required: image feature extraction,text feature extraction,and multimodal feature fusion.Today,visual semantic understanding and question answering tasks face many challenges,the most important of which is how to effectively process visual image features and natural language text features to obtain fusion features with deep semantic information.To achieve this,the model must be able to extract object attributes or associations between objects in the image,and analyze the entity properties and relationships contained in the problem.However,in many cases,it is difficult to obtain the correct answer by simple reasoning based on the fusion characteristics of images and text,and the introduction of external knowledge can assist visual semantic understanding and question answering models for effective reasoning.However,because these models often only meet end-to-end training,their scalability and operability are often not high enough.In addition,since most models are end-to-end training and have poor scalability,it is necessary to adopt more flexible methods to improve the scalability and operability of the model in practical applications.Based on the knowledge graph,this paper conducts experiments and research from three aspects: image text feature enhancement,introduction of external knowledge,and modular call,and has made certain progress:(1)A visual semantic understanding and question answering model FVQA based on knowledge graph enhancement features is proposed.This model enhances the high-level semantic information of image features through the global semantic pool,and strengthens the semantic association of image features with the help of knowledge graph and image attention.Enhance high-level semantics in text features through pre-trained knowledge graph embedding coding? Multimodal attention mechanism is used to assist multi-source feature fusion.This paper proves through experiments that enhancing the advanced semantic information in images or text through knowledge graph is indeed helpful for visual semantic understanding and the completion of question answering tasks.(2)A visual semantic understanding and question answering model KVQA based on knowledge graph knowledge update is proposed.This paper updates the knowledge of the multimodal knowledge graph constructed by us with the help of external knowledge base,which can build an efficient and accurate semantic web and give the model the ability to reason and judge similar to the human brain.In addition,this model uses the general detection module to detect the corresponding category in the picture,and selects an efficient and accurate special module for further processing according to the target association discrimination module.The general module can be used stably,and the training corresponding special question answering module and knowledge update can complete different visual semantic understanding and question answering tasks,which increases the scalability of the model.(3)For the good effect of KVQA in answering reasoning questions,this paper proposes to classify the problems in visual semantic understanding and question answering tasks,adopt different methods and routes to deal with different types of problems,and propose a SKQA model assisted by knowledge graph.Not only the visual semantic understanding and the accuracy of question answering are improved,but the computational efficiency is also further improved.
Keywords/Search Tags:Image features, text features, multimodal fusion, knowledge graph, knowledge update
PDF Full Text Request
Related items