Research On Neural Topic Modeling Method Based On Variational Auto-Encoder

Posted on:2023-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:H W Tang

Full Text:PDF

GTID:2558306905491034

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Topic modeling is one of the common methods to extract knowledge from document sets.At present,the development of conventional text modeling has been quite perfect.It is often used to reduce the dimension of text features,cluster text according to topics,or establish text recommendation system according to user preferences.However,due to the development of the Internet in modern society,a lot of information on the network appears in the form of short text,which brings new challenges to the development of topic model.Most of the current topic models are based on the word co-occurrence information of their own text,and do not introduce the topic sparsity constraint to improve the topic extraction ability of the model.In addition,short text itself has the problem of word co-occurrence sparsity,which seriously affects the accuracy of short text topic modeling.To solve the above problems,this paper will carry out subject modeling based on variational auto-encoder framework and parameterize it by neural network structure.The research contents include:(1)The sparse constraint property of the topic is introduced.In the inference network,by introducing a specific Beta distribution,set a topic controller for each topic with a value of 1 or 0,keep the topics with a value of 1 in the topic controller,and filter out the topics with a value of 0.(2)The context information features are obtained based on the pre training model.The sentence embedding and word embedding vectors are obtained by Sentence-BERT and Word2vec respectively,and the word embedding is integrated into the Gaussian decoder to enrich the context information of short text.(3)VAENTM topic model construction.VAENTM performs sparsity constraints on topics based on the topic controller to filter out irrelevant topics.At the same time,the input of the model becomes the mosaic of bow vector and Sentence-BERT sentence embedding.In the Gaussian decoder,the topic distribution on words is processed into multivariate Gaussian distribution or Gaussian mixture distribution in the embedding space,which explicitly enriches the limited context information of short text.This paper solves the problem of sparse co-occurrence of text words by introducing topic sparse constraint and rich context information.Experimental results show that VAENTM outperforms the benchmark model in terms of confusion,topic consistency and text classification accuracy,and proves the effectiveness of introducing topic sparse constraint and rich context information into short text topic modeling.

Keywords/Search Tags:

Neural Topic Model, Short Text, Variational Auto-Encoder, Sparse Constraint, Context Information

PDF Full Text Request

Related items

1	Mongolian Short Text Semantic Similarity Calculation Based On Deep VAE Integrated With Topic Information
2	Image And Text Joint Modeling Method Based On Multimodal Weibull Variational Auto-Encoder
3	Sparse Topic Models For Short Text
4	A Study Of Short Text Topic Models Based On Information Of Word Embeddings
5	The Building Method Of Auto-adapt Context Based Topic Model
6	Research On Neural Topic Model Based On Dirichlet’s Prior
7	Deep Auto-encoder Framework For SAR Images Change Detection
8	A Study Of Short Text Classification Based On Feature Enhancement
9	Variational Auto-encoders Based On Gaussian Mixture Model
10	Research On Emotional Conversation Generation Technology Based On Topic Model And Variational Auto-Encoder