Font Size: a A A

Belief Propagation For Hierarchical Dirichlet Process

Posted on:2016-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2308330464953274Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the network data from the microblogs, news, blogs and so on is exploding. How to manage and make full use of them becomes a di?cult problem. Topic modeling is one of the most effective methods to solve it. By clustering the documents, topic modeling summarizes a document to a distribution over several topics, which makes people ?nd the information they need from the documents quickly. Traditional topic modeling always needs us to provide the number of the clusters, in another word, we need to set the number of topics according to the data set. In general, we shall need to try various number to ?nd a perfect one to adapt the data according to the relevant standard.The hierarchical Dirichlet process, derived from the latent Dirichlet model on the non-parametric aspect, is a kind of the probabilistic topic model used to analyse huge number of the text data. And it is also a kind of Bayesian non-parametric model,used to solve the problem of determining the number of clusters. The proposing of the HDP model depends on the in?nite dimensions feature of the Dirichlet process,which makes the model ?nd the hidden topic that discovered or undiscovered in the document,and accomplish the target of the dynamic clustering of the non-parametric topic model.Gibbs sampling is a common algorithm to infer the traditional HDP model structured by the Chinese restaurant process of the Dirichlet process. The GS algorithm is the particular one of Markov Chain Monte Carlo, inferences the posterior through large amounts of sampling. However, the GS algorithm is random and can not guarantee the integrity of the message. In this article, we propose to apply the belief propagation algorithm to the hierarchical Dirichlet model to overcome the shortcoming of the traditional GS algorithm. By analysing the factor of the HDP model, we propose two kinds of belief propagation algorithms based on the links between the variables: mapping belief propagation and sampling belief propagation. In the mapping belief propagation algorithm, we build a share level with stick-breaking process of the Dirichlet process to realize the topic mapping between the document level and the share level. Based on the idea of mapping, the number of topics in the share level is much bigger than the number of topics in the document level. To the sampling belief propagation algorithm,we draw lessons from the Gibbs sampling algorithm of the HDP model, and propose it combining the idea of Gibbs sampling and belief propagation. We structure the document level with Chinese restaurant process to achieve the target of update topic dynamically and inference the posterior of the word in document. To the top level,we structure it with the stick breaking process in order to pass the word’s topic message to the factor in top level conveniently. Experimental results show that comparing with other algorithms of HDP model, belief propagation has better performance on perplexity.
Keywords/Search Tags:Hierarchical Dirichlet process, factor graph, mapping belief propagation, sampling belief propagation, expectation maximization
PDF Full Text Request
Related items