Font Size: a A A

Generalization Of Markov Topic Models And Its Variational Inference

Posted on:2024-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:L Y SunFull Text:PDF
GTID:2557307103471444Subject:Statistics
Abstract/Summary:PDF Full Text Request
The Markov topic models can analyze the relationship between multiple corpora and the transfer and evolution of topics between different corpora.In the era of information explosion,multi-corpus databases exist widely,providing a broad application space for Markov topic models.Therefore,this paper mainly studies the parameter estimation method of the Markov topic models and the improvement and extension of the model.Firstly,for the Markov topic models,the detailed derivation formula of the variational inference of the model is given.Considering the amount of text data and the complexity of the algorithm,a stochastic variational inference algorithm is introduced to speed up the calculation.Meanwhile,considering the difference in the number of documents in different corpora,a probabilistic batch stochastic variational inference suitable for multiple corpora is proposed,to improve the robustness of the algorithm.The superior performance of the probabilistic batch stochastic variational inference method is verified on the abstract data set of How Net.Through the topic correlation analysis between different corpora,it is found that the content of ”Statistical and Decision” and ”Statistics and Information Forum” are more similar.Secondly,to improve the assumption that the number of topics in all corpora in the Markov topic models is the same,a Markov topic models with variable topic number is proposed.The improved model allows finding the optimal number of topics belonging to their respective corpora when the number of topics of overall corpora are optimal.Through data set verification,it is found that the improved model has a lower overall perplexity than the Markov topic models,and the topic correlation between different corpora is more significant.Through the research on The contribution rate of words to the topic,it is found that the Markov topic models with variable topic number the words in the topic are more concentrated,and the topic is more representative.Finally,to improve the assumption of all corpora sharing the same vocabulary in the previous model,a multi-vocabulary Markov topic models is proposed.The new model allows each corpus to choose a different vocabulary,significantly reducing the model’s perplexity.Experiments show that when all the words appearing in the corpus are used as the vocabulary,the topics generated by the new model are more concentrated and can better represent the content of the corpus.
Keywords/Search Tags:Markov topic models, stochastic variational inference, perplexity, optimal number of topics
PDF Full Text Request
Related items