Font Size: a A A

Research On Topic Evolution Of Short Text Based On Self-Aggregation Strategy

Posted on:2020-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ZhangFull Text:PDF
GTID:2428330578965996Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile internet,especially the development of social media,instant messaging and other platforms,people transmit opinions,share information and publish news on the internet,which rapidly generates a large number of short text data.Recognition and tracking of short text topics from various platforms can help us better understand the key information in short text data and the hot topics of current time.Topic evolution can observe and explore the evolution of topic connotation of text data between continuous time pieces.Traditional topic evolution technology is mainly based on the topic model represented by LDA(Latent Dirichlet Allocation).Short text data is characterized by sparse features and strong dependence on semantic context,which makes it difficult to directly apply to traditional topic models.In addition,the traditional evolution of topic content is mainly to observe the change of the connotation of the topic words manually,lacking a systematic understanding of the evolution process of the topic between time pieces.In view of the above two problems,the research methods proposed in this paper are as follows:We propose a new topic modeling method based on short text aggregation strategy.Firstly,word embedding technologies such as Word2Vec and Word's Move Distance(WMD)are introduced to make full use of the semantic information of short text.Then,short text is constructed into long pseudo-text by text clustering to expand short text features,so as to overcome the sparsity of short text features.Experiments on a real data set show that the topic modeling method in this paper has good accuracy in topic extraction.We present a new form of topic content evolution.We propose a framework for the phased evolution of topics,which divides the evolution process of topics into emergence,inheritance,splitting,merging and extinction.In this paper,we propose a new semantic-based method for calculating topic similarity,which classifies topics into the above framework according to the threshold and the similarity of different time pieces.At the same time,this paper also considers the evolution of emotional polarity of topic content.Experiments show that the topic content evolution framework proposed in this paper has better topic representation.
Keywords/Search Tags:topic evolution, text cluster, topic model, short text analysis
PDF Full Text Request
Related items