Font Size: a A A

Topic Mining For Chinese Microblog Based On LDA Model

Posted on:2017-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:X Y YiFull Text:PDF
GTID:2348330518995528Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As microblog grows more popular,microblog has become an important platform of information sharing,which has a large number of Internet users,so more and more researches begin to focus on its content mining.It is of great significance that mining topic information effectively from microblog contributes to the government's public opinion monitoring,enterprise products'optimization and to helping users better access to the information needed.At the same time,with the increasing number of users,microblog generates huge amounts of information.It is necessary that improve the processing speed of mining topic information from large-scale microblog data to help users timely access to the topic information they're interested in.Considering the above background,this paper takes Sina microblog as the research object.According to the data characteristics of Chinese microblog and the traditional topic model LDA(Latent Dirichlet Allocation),a new topic model MB-WLDA(MicroBlog-Weighted Latent Dirichlet Allocation)is proposed for the topic mining of Chinese microblog.Firstly,this model introduces user latitude to be an improved LDA topic model of"document-user-topic-word" four layers,which solves the problem that microblog is too short,containing less semantic information.Secondly,weighted strategy is introduced into this model that takes the word weight into Gibbs sampling to reduce the probability of poor expressive words in the topic distribution,which improves the readability and independence of topics.Experimental results on actual datasets show that MB-WLDA outperforms the baseline of LDA.Finally,in the face of processing massive microblog data situation,this paper proposes parallel MB-WLDA with the MapReduce programming model of Hadoop,and describes the detail of the whole implementation of parallel mining topic information from Chinese microblog.Experiments show that the parallel algorithm has a good acceleration performance and scalability in big data environment.
Keywords/Search Tags:microblog, topic mining, Gibbs sampling, LDA, Hadoop
PDF Full Text Request
Related items