Font Size: a A A

Research On Bayesian Nonparametric PCA And Its Application In Topic Models

Posted on:2019-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2428330545477966Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Topic models reveal the hidden topic structures of the text collection and find a highly compressed representation of each document through a set of topics.In fact,if the topics are considered as discrete semantic information in the document set,the topic modeling process is to map the document set onto discrete semantic information.However,such discretized topics do not accurately represent the semantic information of the document.For example,it cannot measure the relationship between topics in a document set,and it cannot directly determine the number of topics.For this purpose,this paper combines the PCA method with the Bayesian nonparametric method.Firstly,a Bayesian nonparametric PCA(BNPP)model is proposed,which is applied to the dimensionality reduction of a set of high-dimensional samples,and the hidden category information of the samples is mined.Secondly,in order to better mine the topics of the document set,the document set is regarded as a combination of multiple sets of data,and a hierarchical framework is used to propose a topic model based on BNPP(BNPP-TM).BNPP-TM projects the document set from the original sample space into the semantic space,and uses continuous semantic space to replace the discrete topics in the traditional topic models.The work of this paper mainly includes four aspects:1)The traditional unsupervised dimensionality reduction method is applied to the dimensionality reduction of high-dimensional samples,ignoring the implied category features of the samples.This paper proposes a Bayesian nonparametric PCA model.Based on the PCA method,the BNPP model adds Bayesian nonparametric prior knowledge to mine the implicit category features of the samples;2)To verify the feasibility of the BNPP model,a Gibbs sampling algorithm for the BNPP model is proposed.The CRP method was used to construct the Bayesian nonparametric components of the model,and the Gibbs sampling method was used to parametrically model the model.Experiments show that the algorithm not only effectively solve the parameters in the model,but also capture the category features of the samples;3)For the traditional topic models,the relationship between topics cannot be measured and the number of topics cannot be directly determined.This paper proposes a topic model based on BNPP.The BNPP-TM considers a set of documents as a combination of multiple sets of data.It uses Hierarchical Dirichlet Process as the prior distribution of implicit variables in BNPP to build a hierarchical model to better mine implicit topic structure of the document set;4)To verify the feasibility of the BNPP-TM model,a variational reasoning algorithm for the BNPP-TM model is proposed.By using the Stick-breaking construction method,the BNPP-TM model can effectively use the Variational reasoning method to solve the model parameters.Experiments show that the algorithm can project the document set into the semantic space and realize the extraction of document themes.On the one hand,it can measure the relationship between the various topics of the document set.On the other hand,it can also determine the number of topics more accurately.
Keywords/Search Tags:Bayesian Nonparametric, Principal Component Analysis, Topic Models, Dirichlet Process, Hierarchical Models
PDF Full Text Request
Related items