Font Size: a A A

The Design And Implementation Of Hot Topic Detection System Of Tweets Based On Spark On Yarn

Posted on:2016-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:H X XingFull Text:PDF
GTID:2348330512970846Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of technology,the problem of explosive growth volume of data came to the stage.In order to effectively deal with this problem,cloud computing rapidly evolves.Twitter as one of the most important social media contains a lot of information to be analyzed.Systems in this field have two shortcomings.Fistly,these systems usually only dig out hot topics based on the explosive growth of independent words.Secondly,these systems can not find relative topics of the hot topics.For the above two shortcomings,this dissertation designs a hot topic system for tweets based on Spark on Yarn.For the first problem,this dissertation designs the real-time hot topic mining module.This mudule adapts the FP-Growth algorithm and designs the SFP-Growth algorithm.The SFP-Growth algorithm is able to detect hot topics in data flow parallelly.In the detection period,this module facilitates the Flume data receiving component to get the streaming data and detect hot topics based on the explosive growth of keywords composition.For the second problem,the system implements the Word2vec algorithm in Spark architecture.The Word2vec algorithm is able to detect related topics of current hot topics.With the above two improve points,this dissertation designs and implements a hot topic system of tweets and deploys it on the cluster.At last this dissertation tests the quality of SFP-Growth and Word2vec algorithm.The result shows that the system fulfills the requirment.
Keywords/Search Tags:Public sentiment analysis, Spark, Tweets, Cloud computing
PDF Full Text Request
Related items