Font Size: a A A

Research On The Representation Learning Method Of Fusion Word And Topic

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhuFull Text:PDF
GTID:2518306458492874Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has spawned a large amount of text data.How to learn and effectively express this rich text information has become a hot issue in natural language processing.Word representation learning is the basic problem of text information representation,and words are effectively represented through parameterization.Researchers have proposed a large number of word representation learning methods,mainly learning the current local semantic representation through the context of the word.However,such methods cannot obtain the global semantic information of the word,which leads to the inaccurate word representation.Topic discovery models,such as LDA,can learn the global topic semantic information of words,but the use of the bag-of-words model does not consider the word order and ignores the local semantic information and other grammatical features of words.Since 2014,researchers have proposed some methods that combine word representation learning and topic discovery,using topics to discover rich semantic information represented by words,and using word representation to improve the accuracy of topic discovery.With the rapid development of deep learning,network graph structure data has attracted the attention of academia and industry.Recently,there have been studies using network representation learning methods to achieve semantic representation of documents.However,the existing methods mainly learn document-level semantic representation,and cannot learn semantic representation of word granularity,and many semantic information of words can be reflected through various networks between words.Therefore,it is necessary to study the construction of word networks and word network-based The words indicate learning and topic discovery methods.This paper draws on the existing word representation learning,topic discovery methods and network display learning methods.The main research content includes:(1)According to the similarity between the text and the Internet,considering the natural language characteristics of the text,the text is analyzed based on the perspectives of word co-occurrence,part of speech,and syntactic analysis.Regarding words as nodes and the relationship between words as connection weights,a word network construction method that integrates the relationships between multiple types of words is designed,and the word network is obtained through real data analysis.(2)A probabilistic generative model that combines word representation learning and topic discovery is proposed,and the task of word representation learning and topic discovery is realized at the same time.This method is based on the above word network,and uses the network representation learning method to train the word network structure,and integrates the local neighboring words of the word and the global topic information of the word to realize word representation learning and topic discovery.The experimental results show that our method is improved in comparison with the classic NLP method.
Keywords/Search Tags:Word Representation Learning, Topic Discovery, Representation Learning, Probabilistic Generative Model
PDF Full Text Request
Related items