Font Size: a A A

Research On Representation Learning Of Word Semantics And Topic Discovery On Document Link Networks

Posted on:2022-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:J L GuoFull Text:PDF
GTID:2518306728471074Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the Internet is proliferating a lot of data all the time.How to use automatic natural language processing technology to extract useful information from vast text data is a hot and difficult point for researchers to explore.Word representation learning is the basis of natural language processing tasks.The existing mainstream methods obtain the low dimensional vector representation of words based on context,but these distributed representation learning methods only retain the local context semantic information of words,which is not enough to represent the global semantic and grammatical information of words.Topic discovery can obtain more global information from the document level,give the topic representation of words and documents from the coarse-grained topic space,and the word representation is interpretable,but the topic discovery is based on one hot representation,and the quality of word representation is poor.Based on the existing research on word meaning representation learning and topic discovery,taking the document link network as the data,this paper first learns the distributed representation of words,documents and topics,and then models the distributed representation of polysemy and document topics at the same time to realize the tasks of word meaning representation learning and topic discovery.The main research contents are as follows:(1)Community discovery model v Graph models document nodes and community representation based on document link network,which does not make full use of the text features of documents,resulting in inaccurate representation of learning documents.In order to obtain more accurate document representation and topic representation,an attribute network semantic representation learning model(Representation learning and Community discovery on links and contents,RColc)integrating document text features and document topology features is proposed.The model takes the realization of semantic community discovery as the task,models the distributed representation of documents and communities,and uses the learned document representation to improve the accuracy of topic discovery.Experiments on real attribute networks show that the modularity index of the algorithm is better than LDA and v Graph model.(2)RColc model performs distributed semantic learning on document content and links,but does not model the polysemy contained in the document,which affects the semantic representation of the original document and then topic discovery.STE(Skip-Gram Topical word Embedding)model models polysemy,but does not consider the rich link relationship between documents.An attribute network representation learning model(Skip-Gram topical word embedding on document link networks,steo LC)is proposed,which integrates document content and document links.The model models the generation process of document links and document content based on word distributed semantic representation,and uses document links and content to find topics and learn word representation more accurately.Experiments show that steo LC can learn the potential topics of word embedding and coherence of specific topics.(3)Compare the performance of RColc and steo LC models in word representation learning,and study whether the fine-grained word representation method considering the word context generation relationship is better than the coarse-grained representation learning.It is applied to the classification task of railway scientific documents.
Keywords/Search Tags:words semantics represent learning, topic discovery, attribute network, community discovery
PDF Full Text Request
Related items