Font Size: a A A

Research On Neural Topic Model Based On Dirichlet’s Prior

Posted on:2024-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2568307067958379Subject:Engineering
Abstract/Summary:
With the development of information technology,the Internet generates massive text data all the time.How to discover specific information from massive data is an urgent problem in the field of machine learning.The topic model is the key technology to solve this problem..Traditional topic models use Dirichlet distribution as the prior distribution of potential topics to guide the generation of topic words.The unique sparsity of Dirichlet distribution can help the model generate sparse topic representations and improve the model’s topic extraction ability.However,the complex reasoning algorithm of the traditional topic model limits the generalization ability of the model,and in the face of massive,real-time updated text data,the traditional topic model cannot quickly reason about the topic of the document.In recent years,with the development of deep learning,it has become a new research hotspot to combine neural network and topic model for in-depth topic extraction and fast processing of large-scale data.Among them,the topic model based on variational autoencoder(VAE)is represented.This type of model requires the prior distribution of latent topics to have a reparameterizable form,and the Dirichlet distribution in traditional topic models cannot be directly applied to neural topic models as the prior distribution of latent topics because it cannot be reparameterized,which makes Neural topic models have poor clustering ability.In response to this problem,some scholars have proposed the method of applying Dirichlet prior to the neural topic model,but as far as the existing methods are concerned,there are still problems such as redundant topic words and many repeated topics.Therefore,in order to obtain sparse topic representation and improve the topic extraction ability of the model,this paper studies the neural topic model and proposes two more effective neural topic models based on Dirichlet prior.The specific work is as follows:1.Approximate the Dirichlet prior using the zigzag process based on the Kumaraswamy distribution.1)Aiming at the problem that the Dirichlet distribution cannot be applied to the neural topic model,a topic model KNTM based on zigzag sampling is proposed.KNTM is improved based on the Neural Variational Document Model(NVDM),and the reparameterizable Kumaraswamy distribution is used to approximate the nonreparameterizable Beta distribution,and then the Dirichlet distribution is approximated by the folding stick process based on the Kumaraswamy distribution.In this way,KNTM successfully applies the Dirichlet distribution to the neural topic modeling framework.2)On the basis of KNTM,a KRNTM model based on cyclic folding stick structure is further proposed.KRNTM combines the generation process of the folding stick structure with LSTM,and uses LSTM to model the long folding stick sequence,which can dynamically assign weights to the folding stick process and improve the stability of the model.3)The effectiveness of the model is verified by experiments.The experimental results show that KNTM/KRNTM can generate more coherent and higher-quality topics than other neural topic models,and KRNTM has more stable performance when extracting high-dimensional topics.2.Approximating Dirichlet priors using a broken stick process based on the Beta distribution.1)Aiming at the problem that KNTM introduces errors when approximating the Beta distribution,a BNTM model is proposed.BNTM directly samples the folded stick variables from the Beta distribution to construct latent topics subject to Dirichlet prior,which helps the model generate more sparse topic representations,and introduces an implicit reparameterization method to solve the problem that the Beta distribution cannot be reparameterized.Inference parameter gradients.Compared with KNTM,BNTM can estimate Dirichlet prior without bias.2)On the basis of BNTM,a BRNTM model based on cyclic folding stick structure is further proposed.BRNTM combines the variables sampled from the Beta distribution with LSTM to more fairly assign the weight of the stick to the base distribution of each topic dimension.This method can effectively alleviate the performance degradation of the model as the number of topics increases.3)Finally,BNTM/BRNTM is compared with KNTM/KRNTM through experiments.The results show that BNTM/BRNTM has better topic extraction ability,and also has better performance in perplexity,topic consistency and topic uniqueness indicators.
Keywords/Search Tags:Neural Network, Topic Model, Variational Auto-Encoder, Dirichlet prior, Reparameterization
Related items