Font Size: a A A

Research On Hot Topic Extraction And Trend Prediction Algorithm Based On Chinese Microblog

Posted on:2018-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ShangFull Text:PDF
GTID:2348330542984891Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Microblog has become the important bridge for publishing and obtaining information.Microblog topics reflect the real conditions of the society.How to extract effective information in the massive microblog hot data,and how to track key hotspot information correctly,have become the focus of research in microblogging data mining.The key element which makes a micro-blog participate in one hot topic is microblog content,so it's reasonable to extract micro-blog hot topic based on the microblogging content.Through the research on the dissemination features and text features of micro-blog,according to the short text,low word frequency,and the frequent use of interactive functions,the thesis designs a LDA driven Content-based Hot topic detection Algorithm,and then a Content Participation-based Hidden Markov Model is proposed in this paper.The main work and innovation are as follows:(1)Based on the research status at home and abroad,the algorithm LDA-CHA,considering both the semantic and textual features of micro-blogs,is proposed to extract hot topics.The communication characteristics such as forwarding,comments and praise also have contribution to the attention of a micro-blog.By learning the numerical relationship among them,a function is set up to compute the heat value for one single micro-blog.Then a formula of computing heat value for one topic is proposed combined with the semantic weight and word frequency weight,which effectively improve the accuracy of hot topic detection.(2)After the definition and recognition of the Microblogging Content Probability and Topic Heat State,the thesis construct CPHMM to forecast the trend of hot topic.The local optimal solution of the model parameters is learned by training procedure.The evaluation prove that the prediction model is relatively reliable,and the complexity and the scale of the input is in the acceptable range.The prediction results have certain credibility.(3)The work is based on real microblog data set.A series of experiments are designed to verify the accuracy of hot topic detection algorithm and the reliability of the trend prediction algorithm.The experimental results have proved the validity.
Keywords/Search Tags:Microblog Hot Topic, Topic Detection, Trend Prediction, LDA, Hidden Markov Model
PDF Full Text Request
Related items