An Online Supervised Topic Model Based On Stochastic Variational Inference And MapReduce

Posted on:2018-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:W Z Song

Full Text:PDF

GTID:2348330515978431

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In the field of Machine Learning,Topic Models and Supervised Topic Models are general models for analyzing natural language.These models can reveal the structural features within the language through probability distribution and visualize them as "topic" and "tag".Topic models have a wide range of applications in daily life,and thus becomes a hotspot in Machine Learning.However,as a commonly used supervised topic model,s LDA model uses a variational EM algorithm and coordinate ascend algorithm based learning algorithm.With the increase in the amount of data,the stacking of the two algorithms increases exponential growth time for s LDA.In addition,s LDA learning algorithm is an offline training algorithm,this feature does not apply to real-time demanding and large amount data applications in our daily life,such as text classification,public opinion and other issues,all of these issues are severely restricted the development of the topic models.To solve the above problems,this paper mainly makes the following contributions:1,An efficient online learning algorithm for supervised topic model is proposed.In this paper,we use the idea of stochastic variational inference to improve the learning algorithm of s LDA.By using the natural gradient of Riemannian space,supervised topic model can more accurately point to the maximum likelihood.In the learning process,we use the natural gradient instead of the gradient of Euclidean space in the s LDA learning algorithm,thus we can speed up the convergence of the algorithm.In addition,by using the idea of stochastic optimization,the training subsets are randomly sampled in each iteration of the iterative algorithm to estimate the gradient of the global parameters,thus we can reduce the computational burden of the model and give the s LDA online learning ability.2,A parallel learning algorithm for online supervised topic model is proposed,and its support for a variety of application scenarios is implemented.Since the number of documents sampled in each round of iterations in the online supervised topic model will have an impact on the label prediction results,the training algorithm needs to be able to flexibly set the size of sample collection in each iteration.In this paper,the popular Map Reduce parallel computing framework is used in the online supervised topic model so that it can be applied to the scene of large-scale data.In addition,this paper uses the flexibility of Python and Mrjob to implement the algorithm to support the single-machine single process,multi-process,distributed computing and cloud computing,to further expand s LDA's scope of application.

Keywords/Search Tags:

Supervised Topic Model, MapReduce, Probability Generative Model, Stochastic Variational Inference

PDF Full Text Request

Related items

1	Research On Construction,Inference And Applications Of Deep Dynamic Probability Models
2	Research And Application Of Probabilistic Generative Model With Variational Learning And Inference
3	Research On Deep Generative Models Based On Variational Inference Of Flow Structure
4	Study On Variational Inference And Application Of Bayesian Method In Topic Model
5	Study Of Image Classification Method Based On Non-Gaussian Probability Model
6	The Research Of Scene And Place Recognition Using An Improvement LDA Topic Model
7	Variational Inference To Supervised Dirichlet Process Mixtures Of Principle Component Analysers
8	Citation Importance Classification Towards Scholarly Full-text Articles And Its Application In Topic Identification Of Scientific Literature
9	Topic Modeling Research Based On Word Embedding And Generative Neural Networks
10	Research On Topic Detection And Tracking Based On Probability Topic Model