Font Size: a A A

Research On Hot Topic Discovery Of Sina Microblog

Posted on:2020-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:B YangFull Text:PDF
GTID:2428330590954867Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the popularity of social networking media such as microblog has been rapidly increasing.As more and more people use microblog as a way to get and publish information,microblog contains a lot of information.It can affect people's lives and social development.It is an important source of Internet public opinion.The research of mining hot topics from the massive microblog data might be helpful for the government to find the current public opinion orientation and provide timely warning guidance.At the same time,it will provide users with hot search topics and enable enterprises to obtain more accurate user needs and product recommendations.The traditional topic detection method,when processing microblog text,exits inaccurate text representation and poor clustering effect.This thesis improves the text representation,the clustering algorithm,heat calculation and so on.This thesis proposes a simple and effective microblog hot topic discovery method.The method mainly includes the following three aspects:First,the text representation exits a high-dimensional sparsity problem on the traditional method.A Text Convolution Auto-encoder(TCAE)is built by combining the advantages of CNN and Auto-encoder.It performs unsupervised learning on the word vector matrix of the text,and obtains an advanced feature representation of the text.Second,an improved multi-threshold Single-Pass algorithm(MTSP)is proposed to overcome the shortcomings of traditional Single-Pass algorithm which is sensitive to input timing.It reduces the impact of a single threshold on the results.It combines similar clusters in the results to avoid false clustering due to the order of data entry.It reduces the possible errors caused by the isolated point processing of existing text.Third,according to the characteristics of microblog data,the topic of microblog is analyzed.The representative microblog in the topic is obtained.By using the information of comment forwarding and user fans,the topic heat calculation formula is designed.By comparing the heat of the calculated topic,the hot topic results are obtained.The experimental results show that TCAE improves the accuracy of the microblog text representation.MTSP will be effective in topic detection.The heat calculation results reflect the situation of hot topics..
Keywords/Search Tags:Topic detection, Hot topics, Text representation, Text clustering, Convolution Auto-Encoder
PDF Full Text Request
Related items