Font Size: a A A

Research On Augmented Semantic Representation Based On External Knowledge

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y W SongFull Text:PDF
GTID:2518306302476224Subject:Financial Information Engineering
Abstract/Summary:PDF Full Text Request
The 21 st century is an era of knowledge.Knowledge,as an induction and summary of external objective laws,improves people's understanding of the external world.At present,in the field of natural language processing,the research on word embedding mainly focused on the current text,mining information from the text and embedding it into the word vector through the neural network.However,semantic disambiguation cannot be achieved by relying only on the current text,because human beings not only rely on the context information of the current text,but also rely on external knowledge when doing reading comprehension.Therefore,in recent years,based on the traditional model,knowledge semantic model has become a new research direction.The typical representative is the Ernie model which introduces the external knowledge graph on the basis of the Bert model,which achieves better experimental effects on the downstream knowledge tasks than the Bert model,proving that the external knowledge can improve the semantics of word embedding.Inspired by this,this paper chooses baidu Encyclopedia,the largest ppen source c hinese knowledge base,as the external knowledge,and studies how to integrate the knowledge information in the encyclopedia entry into the distributed representation of word vectors,so as to achieve the purpose of enhancing the semantic meaning of word vectors.In order to explore the components of compilable knowledge in Baidu Encyclopedia,this paper adopts the method of text modeling for the text of Baidu Encyclopedia entries,and adopts the method of graph modeling for the hyperlink relationship between encyclopedia entries,and explores the knowledge composition forms that can be mined and modeled in Encyclopedia entries.In order to assess the degree of semantic embedding in word vectors,this paper adopts a fine-grained downstream task:synonym-related word classification problem.This paper studies the influence of external knowledge on the semantic representation of word vectors by taking the ability of word vectors to distinguish synonym-related words as a manifestation of the degree of embedded semantics in word vectors.In this paper,it is proved through experiments that only the keywords in the encyclopedia introduction text of the entry can bring semantic improvement to the entry vector itself.While the graph embedding based on the hyperlink of encyclopedia entry can capture the attribute and structure information of the entry in the network,it is not suitable for the semantic enhancement of the entry vector due to the difficulty in obtaining the global network graph.In order to more effectively extract external knowledge from encyclopaedic text and integrate it into the term vector,this paper designs a twin network knowledge semantic model based on the mechanism of deep learning and Key-Attention to integrate the knowledge extracted from the encyclopedia text with the pre-trained Bert word vector.Word vector Ke E(Knowledge enhenced Embedding)which integrates external knowledge of the encyclopedia is output as the model.In order to evaluate the augmented word vector with scientific knowledge,this paper also conducts a controlled experiment.With the same twin network model framework and the same experimental parameters,Bert word vector has also been trained without adding external knowledge,so as to remove the influence of model training on experimental results.By comparing the pre-trained Bert word vector,the trained Bert word vector without introducing external knowledge,and the knowledge that integrates encyclopaedia external knowledge to enhance the word vector,the influence of external knowledge on the word vector can be analyzed more objectively.Finally,in the experiment of synonym-related word classification,the trained Bert word vector is 4.67% more accurate than the pre-trained Bert word vector,which proves the validity of the twin network model.Compared with the word vectors trained by the model without external knowledge,the word vectors with external knowledge are 6.87% more accurate in the classification of related word synonyms,proving that external knowledge can embed more semantics in word vectors and perform better in the downstream tasks of fine-grained synonyms.Based on the knowledge semantic model,this paper explores the encoding method of paragraph vectors when extracting knowledge from encyclopedic text.Four network structures,namely,word vector average method,bidirectional Rnn network,bidirectional Lstm network and 6-layer Transformer,are used to encode paragraph text.In the control under the premise of other conditions unchanged,comparing the four types of text vector encoding enhanced by knowledge of word.In the final classification of synonym-related words,knowledge enhanced word vectors obtained through Transformer structure have the strongest ability to classify synonym-related words,which are 4.88%,6% and 3.78% more accurate than those obtained through word vector average method,RNN structure and LSTM structure respectively.In the case of the word synonyms task in this article,the Transformer structure is better suited to compiling context information from long paragraph text in an Encyclopedia entry.
Keywords/Search Tags:word embedding, knowledge semantic model, Baidu encyclopedia, synonym
PDF Full Text Request
Related items