Font Size: a A A

Research And System Design Of Hot Topic Discovery Method Based On Microblog Data Flow

Posted on:2019-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:W WeiFull Text:PDF
GTID:2428330545952118Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and the popularity of mobile network,communication among the public is becoming more and more frequent,and the methods of communication are becoming more and more abundant.Microblog provides functions such as message publishing,subscripting,commenting,and forwarding,users can view or post messages whenever they want and wherever they are,this makes microblog become more and more popular among netusers.Meanwhile,the participation of some news media and public figures has made microblog gradually become an influential platform of public opinion.More and more attention has been paid on the public opinion monitoring of microblog.Topic discovery is an important part of opinion monitoring,it can figure out the events that people are focusing on from numerous data,so it has the great significance in public opinion monitoring.The most important feature of microblog information is the difference of text length,and most of them are short texts.Also,the information of microblog is updated in real time,and hotspot changes timely.This has made great difficulties for the clustering of texts and the screening of topics.To solve this problem,this paper proposes a hot topic discovery model based on LDA,and make the implementation and system design of the model by using Spark platform.The main work includes:1.As for topic analysis,this paper propose a model based on time windows,which use the data in the previous time window to train the LDA model offline and use the trained model on the Spark Streaming platform online to do real-time topic analysis of documents.In this process we can get the topic distribution of each document and calculate the heat of each topic through the topic heat formula.At regular intervals,the model is trained and updated with newly acquired data.2.As for calculating the heat of topics,after analyzing the structure and characteristics of microblog in depth,combining the characteristics of microblog users and blog posts,we proposed a method to calculate the heat of topic which is based on the result of LDA model.The influence of each blog posts is calculated by the number of followers and the number of comments on blog posts and so on,and the value of influence is used as a weight to get the expectation of each topic which is used as the heat of the topic.Finally,we can get hot topics by sorting the heat of topics.3.This paper has made a systematic implementation of the model.The information acquisition module can acquired data continuously by using the API and storing the data in the local database.After preprocessing the data by word segmentation and stop words deleting,the semantic analysis module is used to analyze the online topic of the document,calculate the heat of the topic,and update the model.Hot topics can be obtained from the result of semantic analysis module,finally the hot topics are displayed in the result display module.
Keywords/Search Tags:microblog, topic discovery, hot topic, LDA, Spark, Spark Streaming
PDF Full Text Request
Related items