Comparison And Improvement Studies Of Topic Model LDA Inference Algorithms

Posted on:2018-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:J W Zhang

Full Text:PDF

GTID:2348330542965248

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As a popular topic model algorithm,Latent Dirichlet Allocation(LDA)clusters documents and words on topic layer to decompose high-dimensional sparse documentswords matrix into two relatively dense documents-topics matrix and topics-words matrix.Since Blei David proposed LDA in 2003,there have been three inference algorithms for LDA: Gibbs Sampling(GS),Variational Bayesian Inference(VB)and Expectation Maximization(EM).There are emerging variant algorithms aiming at various application contexts based on the three inference algorithms,such as the batch LDA algorithms for small data,the online LDA algorithms for big data and the accelerating algorithms for real-time processing.However,it is detected that there are still three unsolved problems in LDA,and in this thesis we study them correspondingly:1)The comparative research on the predictive abilities of the three LDA inference algorithms,with the specific problem embodied on the selection of inference algorithms in practice.Under the framework of entropy,this thesis re-understands the optimization object of LDA,the optimization objects of LDA inference algorithms and the perplexity(a popular metric for LDA).According to the research findings,compared with GS and VB,EM can achieve better predictive perplexity(a standard performance metric for LDA predictive abilities),by directly minimizing the cross entropy between the observed word distribution and LDA's predictive word distribution.2)The function research of LDA priors(Dirichlet hyper-parameters and number of topics)on the predictive abilities of LDA,with the specific problem embodied on the determination of the priors in practice.This thesis analyzes the influence of LDA priors on the model's predictive abilities from the aspect of entropy,finding that the adjustment on Dirichlet hyper-parameters and number of topics has influence on the entropy of the word distribution predicted by LDA,and has further influence on the predictive abilities of LDA.Based on the rules of the influence of Dirichlet hyper-parameters on the model's predictive abilities,this thesis proposes a grid searching based next-best hyper-parameters values searching algorithm.3)The convergence speed based research on LDA accelerating algorithms,with the specific problem embodied on the selection of the accelerating algorithms,i.e.,which accelerating algorithm can achieve the fastest convergence speed.For solving the drawbacks of FEM,this thesis proposes an EM based new accelerating algorithm AEM(Adaptive EM),with the core idea that the topics to be updated in each document are reduced in a self-adaptive way with the convergence of the model.Based on multiple datasets and different numbers of topics set by this thesis,AEM has the convergence speed 9% ~ 38.5%,4.1% ~ 15.5% and 11.7% ~ 43% faster than that of relatively advanced FEM,Alias LDA and SparseLDA,respectively.

Keywords/Search Tags:

Latent Dirichlet Allocation, Inference Algorithms, Priors, Convergence Speed

PDF Full Text Request

Related items

1	Topic Models Algorithm Based On Features, Priors And Constraints
2	Design And Implementation Of A Text Recommender System Of Social Network Based On Latent Dirichlet Allocation
3	Research On Distributed Stochastic Variational Inference Algorithms For Big Data
4	Analyzing Clustered Latent Dirichlet Allocation
5	Research On Fast Gibbs Sampling Topic Inference Algorithms For Topic Models
6	Aurora Image Classification Based On Multi-Feature Latent Dirichlet Allocation
7	Research And Implementation Of Distributed Topic Clustering Technology For Text Flow
8	Research On Text Retrieval Based On Topic Analysis
9	Exploring Entropy-based Term Weighting Schemes In Latent Dirichlet Allocation
10	A Research Of Latent Dirichlet Allocation Model Based On Improved Variational Inference