Font Size: a A A

Study On Variational Inference And Application Of Bayesian Method In Topic Model

Posted on:2020-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J ChiFull Text:PDF
GTID:1368330575481200Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Bayesian method,as one of the mainstream methods of machine learning,has made important contributions to the development of many fields such as artificial intelligence.The inference of the Bayesian model is the core of Bayesian method.A central problem of Bayesian model is to compute the posterior distribution.However,in many practical applications,the posterior distribution is intractable for exact calculation.Hence,a host of approximation methods have been proposed.The variational method is currently the most commonly used method to solve this problem.An important application of the Bayesian method is text mining.With the rapid development of big data,discovering topics of texts has received great attention from industry and the learning community.The topic model as an important Bayesian model is the most commonly used model for discovering topic of documents.Topic is a multinomial distribution over the vocabulary.The top word list,i.e.,the top-M words with highest marginal probabilities in a given topic,is the standard topic representation in topic models.We focus on Bayesian methods and their applications,including two parts:The first part is to study a class of mainstream methods in the Bayesian method,namely the variational inference.The second part is to study the application of Bayesian method in the topic model.The main contributions are outlined as follows:1.Traditional variational inference mainly includes standard mean-field variation inference(MFVI),collapsed variational Bayesian inference(CVB),hybrid variational-Gibbs(HVG)and expectation propagation(EP).They are widely used in many problems.However,whether any method is the best or suitable for certain settings is still unclear.It leads to practical difficulty for selecting an appropriate variational method without implementing them all.In this paper,we analyze them based on LDA(latent Dirichlet allocation)from two perspectives and systematically compare them in theory,i.e.the variational distribution and the type of?-divergence.The performance order may be CVB>EP?HVG>MFVI.The time complexity order may be MFVI>EP=HVG>CVB(from high to low)and the space complexity order may be CVB>EP=HVG=MFVI(from high to low).We present experimental results of the four inference methods on LDA model.We use both two synthetic data-sets and five real-world data-sets.Two popular metrics are used in our experiments.First,we compute the perplexity of held-out test sets by convention and then compute the pointwise mutual information(PMI)score to measure how coherent the inferred topics are.The empirical results almost match our theory analysis,where we find that CVB is more effective and efficient than other inference methods.It is suggested in practical topic modelling applications without considering space complexity.2.The mostly traditional variational inference methods are based on mean-field and ignore the dependencies between latent variables,resulting in lower performance.To address this,the copula variational inference(CVI)method is proposed by using the well-established copulas to effectively capture posterior dependencies,leading to better approximations.However,it suffers from a computational issue,where the optimization for big models with massive latent variables is quite time-consuming.This is mainly caused by the expensive sampling when forming noisy Monte Carlo gradients in CVI.For CVI speedup,in this paper we propose a novel fast CVI(abbr.FCVI).In FCVI,we derive the gradient of CVI objective by an expectation of the mean-field factorization.Therefore,we can achieve a much efficient sampling from the mean-field factorization,instead of copula augmented distribution,enabling to reduce the sampling complexity fromO(D~2)to O(D).We perform inference on the Gaussian mixture model and latent space model.For convincing evaluation,we use both synthetic data-sets and real-world data-sets.We evaluate our method FCVI against baselines on performance and runtime.Experimental results demonstrate that FCVI is on a par with CVI,but runs much faster.3.The top word list,i.e.,the top-M words with highest marginal probabilities in a given topic,is the standard topic representation.However,we find,empirically,words in this type of top word list are not always representative,even meaningless noise words.To achieve this,we rerank the words in a given topic by further considering marginal probabilities on words over every other topic.We investigate three reranking methodologies,using(1)standard deviation weight,(2)standard deviation weight with topic size and(3)Chi SquareX~2statistic selection.We evaluate our novel topic representations on two real-world collections based on LDA.The first evaluation is on whether our reranking representations can filter out stopword-like words in the top list.The second evaluation is on whether the top words in our reranking representations are more representative by using the word intrusion task.Experimental results indicate that our representations can extract more representative words for topics,agreeing with human judgements.
Keywords/Search Tags:Bayesian method, Variational inference, Mean-field, Dependency, Copula, Topic representation
PDF Full Text Request
Related items