Font Size: a A A

Research On Short Text Topic Discovery Based On BTM Topic Model

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:W G ZhouFull Text:PDF
GTID:2428330614471183Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the further development of social media and major social platforms in recent years,the rapid explosion of the modern society with the explosion of information,all kinds of mixed short text data are all around us,such as Weibo data,major comment status information,movie reviews and other information,the vast majority of these short text data information contains rich and valuable information.One type of method that is frequently used at present is the topic model.This type of model mainly learns the topic structure hidden in the document data set through modeling to automatically understand and analyze the semantic in the text set.Traditional theme models are basically based on long text data for modeling and learning.When applied to short text,they do not take into account the sparseness of text data produced by short text produced by users.In order to overcome the issue of data sparsity in short text set as much as possible,this article uses the BTM short text topic model to model and collect the collected short text data of Weibo.The model uses all co-occurring word pairs in the document as modeling objects successfully overcomes the sparseness of each short text due to the short content and insufficient co-occurrence information.After deeper research and discovery of the technical methods used in the topic discovery process of the model modeling and learning,combined with the collected microblog data sets,the BTM model is tapped for potential topic discovery modeling technology the shortcomings in the implementation of improvement measures to make the results of short text topic discovery more accurate and higher topic quality,detailed work is as follows:(1)For the standard BTM topic model,only the co-occurrence word pairs are modeled during the modeling process,while ignoring the interactive attributes of the microblog content and the semantic connection of the co-occurrence word pairs.In the standard BTM model based on this,a BTM topic model with popularity and semantic relevance is proposed.The improved model uses the impact of the number of comments,reposts,and likes on Weibo as much as possible.Using the heat matrix as the weight value of the probability distribution of the words in the document,the words in the document are improved to a certain extent probability of distribution.And when the Gibbs sampling algorithm is used in the BTM model,the Word2 Vec model is used to perform semantic similarity calculation on the co-occurrence word pairs in thedocument.If the predetermined value is met,the co-occurrence word pairs are strengthened to make the word pairs semantically related.The experimental results also show that the improved BTM model improves the probability distribution of words in the document,makes the description of various topics more accurate,and the quality of the topics obtained after modeling is higher.(2)In the short text topic discovery stage,the Single-Pass clustering algorithm is used to perform cluster analysis on the document vectors obtained by the improved model modeling and learning,which alleviates the sensitivity of the input document order in the Single-Pass clustering algorithm.Disadvantage.After the value optimization of the parameter threshold,the similarity calculation method of the clustering algorithm is improved,and then the topics hidden in the short text are excavated with higher accuracy.Combined with comparative experiments,it can be concluded that this method has a certain improvement in the accuracy rate P,recall rate R and F1 value,so that the results of short text topic discovery have been improved in accuracy.
Keywords/Search Tags:BTM topic model, Gibbs sampling, Single-Pass clustering, Short text, Topic discovery
PDF Full Text Request
Related items