Font Size: a A A

Research On Topic Detection Method Of Complex Short Text Based On Topic Model

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y X YuanFull Text:PDF
GTID:2518306764471764Subject:Internet Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network science and technology,a large number of complex short text data have been generated on social media platforms.Compared with traditional news data,it has the complex characteristics of massive,high-dimensional,fast update,little semantic information,high noise and low text quality.Traditional clustering algorithms based on multiple scanning and iterative clustering can no longer meet the needs of real-time information mining in social media.How to quickly and accurately mine the hidden topics in social media short text data is a key problem in natural language processing and computational linguistics.Aiming at the problem of sparse semantics of short texts,this thesis extracts deeper semantic association information of short texts through word co-occurrence,and researches online clustering algorithm of social media short texts on the basis of existing research.At the same time,aiming at the problems of fine clustering granularity and scattered clustering results of existing online clustering algorithms,a topic detection method based on short text is proposed to realize clustering between events and fuse events into topics through the relevance of social media events in text and topic.The specific research contents of this thesis are as follows:(1)A semantically enhanced online short text clustering algorithm based on word co-occurrence is proposed.The online short-text clustering algorithm proposed in this thesis is based on the dirichlet process polynomial hybrid model,which does not need to manually specify the number of clustering topics in advance,but is calculated by the model itself in the clustering process.In order to enrich the semantic information of short texts,this thesis takes short texts as co-occurrence Windows and extracts word cooccurrence relations between short texts.In view of the change of topic center in the clustering process,this thesis selected clusters to be fused according to the similarity between the short text and the existing clusters,and finally carried out the clustering of the clusters.Comparative experiments show that the proposed clustering algorithm has better clustering performance.(2)A short text based topic detection method for social media is proposed.Aiming at the problems of fine clustering granularity and scattered clustering results of existing social media short text online clustering algorithms,this thesis proposes a social media topic detection method,which consists of three stages: text preprocessing,preprocessing of social media short text;In the online clustering stage,the short texts are clustered online to identify the events in the social media short text stream in the form of small class clusters,generate the social media event set,and construct the event topic association network.In the topic detection stage,the similarity between events is calculated according to the event set and the event topic association network generated in the online clustering stage,and the event similarity network is constructed.Finally,the tag propagation algorithm is adopted to divide the event similarity network into communities and generate the social media topic set.Experiments show that the proposed method is more consistent with the real topic distribution of text data,and the clustering result is more compact.
Keywords/Search Tags:Social media, online clustering of short texts, word co-occurrence, topic detection
PDF Full Text Request
Related items