Font Size: a A A

Research On LDA Short Text Clustering Algorithm For Microblog Comments

Posted on:2021-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:R X YangFull Text:PDF
GTID:2428330629950523Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Microblog has become one of the most popular social media platforms in China because of the convenience of its comments.Microblog comments often have strong emotional color,and the emotional analysis of microblog comments is an important way to obtain users' views and attitudes.At present,LDA topic model has become a research hotspot in the field of microblog comment analysis.In this paper,aiming at the problem that the accuracy of traditional LDA in microblog comment sentiment analysis is not good,using feature extraction and word co-occurrence technology,through the weighting of sentiment topic feature words,we have carried out in-depth research on short text clustering algorithm of LDA for microblog comment,to improve the quality of semantic information,and optimize the clustering effect of microblog comment sentiment analysis.The main research contents are as follows:Firstly,the key technologies of short text clustering of LDA topic model are analyzed,including the basic principle of LDA topic model,feature extraction technology,word co-occurrence model.Secondly,aiming at the problem of traditional LDA's poor ability in subject emotion analysis and semantic extraction,sentiment word co-occurrence and knowledge pair feature extraction based lda short text clustering algorithm is proposed.First of all,we define the word bag based on emotional word co-occurrence of emotional words,fully consider the co-occurrence of emotional words in different short texts,and give emotional polarity to the short texts of microblog.Then,we design the algorithm of constructing topic special word set and topic relation word set,extract the knowledge pairs of topic special word and topic relation word,inject them into LDA topic model for clustering,and then find more accurate semantic information.Finally,the K-means algorithm is used to cluster the Top30 topic feature word set in primary clustering,which is obtained by LDA topic model.And the clustering center is optimized in K-means secondary clustering iteratively.Thirdly,in view of the low accuracy of sentiment analysis of microblog comments,the algorithm of microblog comments based on the weighting of emotion topic words is proposed.Firstly,emotion topic words are extracted by defining emotion topic word bag.Then,emotion topic feature words are obtained by semanticsimilarity calculation,which are weighted by defining two parameters of importance and distribution of emotion topic feature words to improve the weight of emotion topic feature words with strong expression ability.Finally,weighted emotion topic feature words are clustered by LDA.The experimental results show that,the LDA short text clustering algorithm based on emotional word co-occurrence and knowledge to feature extraction in this paper has better semantic analysis ability and emotional topic clustering effect.In addition,the clustering algorithm of microblog comments based on the weighting of emotional topic feature words also shows better clustering effect of emotional analysis,and improves the accuracy of emotional analysis of microblog comments.
Keywords/Search Tags:Microblog comments, LDA, clustering, sentiment analysis, feature weighting, topic special word
PDF Full Text Request
Related items