Font Size: a A A

Research On Visual Question Answering Technology Based On Knowledge Base

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:X B ChenFull Text:PDF
GTID:2428330623967893Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Visual Question Answering(VQA)is an artificial intelligence task that outputs an answer to a question given a picture and a related natural language question.Compared with other tasks,VQA is closer to General Artificial Intelligence(GAI).Therefore,the research of VQA model has high research value and promising application scenarios.Ac-cording to whether the knowledge base is introduced,the existing models are divided into joint embedding models and knowledge base-based models.These two types of models have good performance in VQA tasks.However,the mainstream joint embedding model has the defects of data set dependence,small network capacity and insuff-icient text rep-resentation ability.On the other hand,by introducing an external knowledge base,the knowledge base-based model overcomes the network capacity limitation of the joint em-bedding model and can answer inference questions involving common sense or external knowledge.However,it needs to construct knowledge base query statements manually,which greatly limits the generalization ability of the model.This paper improves the text representation method of the j oint embedding model and the generality of the model based on the knowledge base,mainly including the following:1)Introduce dynamic word embeddings to improve the text characterization method of the joint embedding model.The current text embedding method of the joint embed-ding model still uses the static word embedding method.Considering that the static word vector cannot effectively represent the polysemy and multi-word,our paper introduces dynamic word embeddings to the VQA model,combining Faster R-CNN and attention mechanism,proposed a joint embedding model(N-KBSN)based on dynamic word em-beddings.The experimental results prove that the dynamic word embedding can achieve better text feature representation,thereby improving accuracy.2)Construct a knowledge base graph embedding module to extend the versatility of knowledge-based models.The knowledge base graph embedding module constructed in this paper extracts core entities from images and text,and maps them as knowledge base entities,then extracts the sub-graphs closely related to the core entities,and converts the sub-graphs into low-dimensional vectors to realize sub-graph embedding.In order to achieve good subgraph embedding,we first extracted two experimental knowledge bases with rich semantics from DBpedia:DBV and DBA.Based on these two knowledge bases,a series of knowledge base embedding models are selected to produce link prediction.The results show that there is a clear correspondence between the entities of the DBV,which can achieve excellent node embedding.And the TransE model can achieve a good knowledge base embedding,so we built the knowledge base graph embedding module based on TransE.3)Merge the knowledge base graph embedding module and the N-KBSN model,and construct a VQA model(KBSN)based on the knowledge base graph embedding.Ex-perimental results on multiple data sets prove that the knowledge base graph embedding module improves the accuracy of VQA.The accuracy improves significantly while pro-cessing complex problems that require common sense or external knowledge.
Keywords/Search Tags:Visual Question Answering, Joint Embedding Model, Knowledge Base, N-KBSN, KBSN
PDF Full Text Request
Related items