Font Size: a A A

An Online Supervised Topic Model Based On Stochastic Variational Inference And MapReduce

Posted on:2018-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:W Z SongFull Text:PDF
GTID:2348330515978431Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the field of Machine Learning,Topic Models and Supervised Topic Models are general models for analyzing natural language.These models can reveal the structural features within the language through probability distribution and visualize them as "topic" and "tag".Topic models have a wide range of applications in daily life,and thus becomes a hotspot in Machine Learning.However,as a commonly used supervised topic model,s LDA model uses a variational EM algorithm and coordinate ascend algorithm based learning algorithm.With the increase in the amount of data,the stacking of the two algorithms increases exponential growth time for s LDA.In addition,s LDA learning algorithm is an offline training algorithm,this feature does not apply to real-time demanding and large amount data applications in our daily life,such as text classification,public opinion and other issues,all of these issues are severely restricted the development of the topic models.To solve the above problems,this paper mainly makes the following contributions:1,An efficient online learning algorithm for supervised topic model is proposed.In this paper,we use the idea of stochastic variational inference to improve the learning algorithm of s LDA.By using the natural gradient of Riemannian space,supervised topic model can more accurately point to the maximum likelihood.In the learning process,we use the natural gradient instead of the gradient of Euclidean space in the s LDA learning algorithm,thus we can speed up the convergence of the algorithm.In addition,by using the idea of stochastic optimization,the training subsets are randomly sampled in each iteration of the iterative algorithm to estimate the gradient of the global parameters,thus we can reduce the computational burden of the model and give the s LDA online learning ability.2,A parallel learning algorithm for online supervised topic model is proposed,and its support for a variety of application scenarios is implemented.Since the number of documents sampled in each round of iterations in the online supervised topic model will have an impact on the label prediction results,the training algorithm needs to be able to flexibly set the size of sample collection in each iteration.In this paper,the popular Map Reduce parallel computing framework is used in the online supervised topic model so that it can be applied to the scene of large-scale data.In addition,this paper uses the flexibility of Python and Mrjob to implement the algorithm to support the single-machine single process,multi-process,distributed computing and cloud computing,to further expand s LDA's scope of application.
Keywords/Search Tags:Supervised Topic Model, MapReduce, Probability Generative Model, Stochastic Variational Inference
PDF Full Text Request
Related items