Font Size: a A A

Research On Text Topic Discovery Technology Based On Group Chat

Posted on:2022-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2518306749972009Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,represented by QQ,We Chat social software gradually become the main tool of communication in People's Daily life,group chat is one kind of important function,produced huge amounts of data group chat,topic detection technology can analyze the subject involved in the group chat,allows users to quickly learn a bunch of hot topic,It is of great significance to improve user experience.Topic model is an important method to realize text topic discovery.However,the traditional LDA topic model,which is directly applied to group chat text topic mining,will have poor effect of topic modeling due to the sparsity of group chat short text.Meanwhile,LDA uses the word bag model to represent documents,and only considers word frequency in topic mining,ignoring the semantic relationship between words,resulting in poor semantic coherence of the final topic.Therefore,this thesis will improve the LDA model from two aspects of short text features and semantic connections between words,so that it has a better performance in group chat topic mining.Based on the LDA Model,this thesis proposes a Group Chat Topic Mining Model Based on Bert-LDA Model(Bert-LDA).The Bert-LDA Model preserves the original context by combining contexts.Then,BERT pre-training model is used to extract semantic features of group chat text,which is taken as the input of the clustering model.Through the method of text clustering,the words contained in each document are expanded,so that the words can be widely distributed in various topics,rather than tend to a few topics.Furthermore,LDA model does not consider the semantic relationship between words in topic mining,which results in poor semantic coherence of the obtained topic.Therefore,This thesis proposes a Semantic Enhancement and Bert-LDA based Group Chat Topic Mining(SEBL)model.Based on bertLDA model,SEBL model introduces semantic relations between words into topic modeling.Keywords that can represent documents are selected as candidate word sets for semantic enhancement through the part of speech characteristics and TF-IDF.Then,a Generalized Polya Urn model(GPU)is used to increase the distribution probability of semantically similar candidate words in the same topic,so that the generated topic descriptors have stronger semantic relationships and improve the semantic coherence of the topic.Finally,this thesis conducted experiments based on real QQ group chat data,using confusion and topic semantic coherence as the evaluation index of the model,and compared SEBL model,Bert-LDA model,LDA model and BTM model.The final experimental results proved that,Compared with LDA and BTM,the Bert-LDA model and SEBL model proposed in this thesis can achieve lower confusion degree and better semantic coherence of the topic.Meanwhile,SEBL model is superior to Bert-LDA model,which proves that semantic enhancement and candidate word selection can improve the model effect.
Keywords/Search Tags:LDA, BERT, Word2Vec, TF-IDF, Semantic Enhancement
PDF Full Text Request
Related items