Font Size: a A A

Latent Dirichlet Allocation Based On MCEM Algorithm

Posted on:2018-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:M L LiFull Text:PDF
GTID:2428330512492156Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Latent Dirichlet Allocation(LDA)is the most popular method to analyse text data.It treats a document as a collection of words and explores the topic meaning to figure out the latent information of document.In recent years,LDA has been widely used in text analysis,recommendation system,information retrieval and other fields.The key problem of LDA is to assign topic for latent variable and estimate unknown parameter.In this paper,posterior distributions of unknown parameters are obtained firstly based on Dirichlet-Multinomial conjugate structure.And then by using Bayesian formula,the posterior of latent variable is inferred and CGS sampling formula is obtained.Subsequently,we infer from variational Bayes and maximize the Evidence Lower Bound(ELBO)to find the optimal solutions to unknown parameters.During optimization,Monte Carlo simulation and EM algorithm are combined,thus forming MCEM algorithm.In E step:sample topic proposals zdns based on Metropolis Hastings algorithm.During sampling,we draw samples alternatively from word-proposal and doc-proposal by visiting tokens doc-to-doc.Based on multinomial dstribution,Alias Sampling and Random Positioning are adopted to draw a single topic proposal within O(1)complexity during each sampling.The innovation of this paper is that drawing samples for latent variable alternatively from the two simple proposed distributions,two mixture of multinomials,which can not only avoid autocorrelation but also improve generalization ability of model.
Keywords/Search Tags:LDA, Latent Variable, Variational Bayes, ELBO, MCEM, MetropolisHastings, Alias Sampling, Random Positioning
PDF Full Text Request
Related items