Topic modeling is one of the common methods to extract knowledge from document sets.At present,the development of conventional text modeling has been quite perfect.It is often used to reduce the dimension of text features,cluster text according to topics,or establish text recommendation system according to user preferences.However,due to the development of the Internet in modern society,a lot of information on the network appears in the form of short text,which brings new challenges to the development of topic model.Most of the current topic models are based on the word co-occurrence information of their own text,and do not introduce the topic sparsity constraint to improve the topic extraction ability of the model.In addition,short text itself has the problem of word co-occurrence sparsity,which seriously affects the accuracy of short text topic modeling.To solve the above problems,this paper will carry out subject modeling based on variational auto-encoder framework and parameterize it by neural network structure.The research contents include:(1)The sparse constraint property of the topic is introduced.In the inference network,by introducing a specific Beta distribution,set a topic controller for each topic with a value of 1 or 0,keep the topics with a value of 1 in the topic controller,and filter out the topics with a value of 0.(2)The context information features are obtained based on the pre training model.The sentence embedding and word embedding vectors are obtained by Sentence-BERT and Word2vec respectively,and the word embedding is integrated into the Gaussian decoder to enrich the context information of short text.(3)VAENTM topic model construction.VAENTM performs sparsity constraints on topics based on the topic controller to filter out irrelevant topics.At the same time,the input of the model becomes the mosaic of bow vector and Sentence-BERT sentence embedding.In the Gaussian decoder,the topic distribution on words is processed into multivariate Gaussian distribution or Gaussian mixture distribution in the embedding space,which explicitly enriches the limited context information of short text.This paper solves the problem of sparse co-occurrence of text words by introducing topic sparse constraint and rich context information.Experimental results show that VAENTM outperforms the benchmark model in terms of confusion,topic consistency and text classification accuracy,and proves the effectiveness of introducing topic sparse constraint and rich context information into short text topic modeling. |