Font Size: a A A

Research On Semantic Expression Based On Knowledge Source Embedding And Multi-modal Data Fusion

Posted on:2022-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q YinFull Text:PDF
GTID:2518306311469574Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The study of semantic expression is one of the important cornerstones of natural language processing.By using various machine learning methods,fully mining the semantic information contained in vocabularies,sentences and images,it will provide effective support for various application tasks of natural language processing.In recent years,semantic similarity research algorithms based on knowledge source information have been constantly introduced in the application fields of text classification,language understanding and information retrieval.The semantic learning method of multimodal data fusion is used to extract rich semantic information from sentences and images,which is also widely used in visual question answering.Therefore,this thesis focuses on the above two to analyze semantic expression,and the research work is as follows:(1)Enhancing word embeddings using semantic relationsSemantic learning method combining text and image embedding is the basis of most cross modal applications.This paper proposes a solution to the problem of semantic feature learning and multimodal feature fusion between image and text in visual question answering(VQA)task.Neural language models(NLM)is one of the basic tasks of natural language understanding.However,studies have shown that neural network word vectors generated by co-occurrence prediction generally have poor semantic expression ability.At present,most researches use external knowledge sources to modify the word vectors of neural network,but the semantic information and semantic network hierarchy information contained in the knowledge base of lexical semantic network have not been strictly distinguished and effectively utilized.Based on the semantic association in the lexical semantic network,this thesis analyzes the problem of lexical semantic transmission in the semantic network,using the direct hypernym/hyponym word sets(word pairs formed by a word and its immediate adjacent vocabulary)with asymmetric relationship and closer semantic concept,as well as the synonyms and antonyms with symmetric relationships.The concept hierarchy information of semantic network is integrated into the neural network word vector space,which improves the quality of semantic constraint set injected into the neural network word vector,and then realizes the correction of the neural network word vector.By using the gold standard dataset Sim Lex-999 and Sim Verb-3500,the Spearman correlation coefficient of the improved neural network word vector correction model is increased by about 7%,which effectively enhances the semantic expression ability of the distributed word vector and improves the performance of the lexical semantic similarity task.(2)Multi-modal data fusion based on semantic learningThe semantic learning method of joint text and image embedding is the basis of most cross-modal applications.This paper proposes a solution to the problem of semantic feature learning and multimodal feature fusion between image and text in visual question answering(VQA)task.Multi-modal feature semantic representation adopts the automatic encodingdecoding neural network structure of the collaborative attention mechanism,which can interactively learn image semantic feature representation and text semantic feature representation at the same time.The collaborative attention mechanism includes the selfattention mechanism and the guided attention mechanism.The self-attention module can strengthen the information interaction within each modality(between words in the text,between local areas in the image);The guided attention module can enhance the information interaction between the various modalities(between words and image regions),and can effectively reduce irrelevant noise interference.In the modal feature fusion stage,multi-modal bilinear fusion method is adopted to fuse semantic feature information extracted from images and texts.The bilinear fusion algorithm can avoid the complex correlation problem between multi-modal features that cannot be captured by linear fusion method.Experimental results on VQA-2.0 data set show that the improved model improves the accuracy of answer prediction by about 0.2.
Keywords/Search Tags:neural network word embedding, semantic similarity, attention mechanism, multi-modal data fusion
PDF Full Text Request
Related items