Font Size: a A A

Research On Short Text Topic Modeling Based On External Information

Posted on:2022-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2518306353476884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a text mining technology,topic model has been developed in conventional text modeling,and it has been widely used in the fields of feature extraction,topic analysis,recommendation system and so on.However,with the development of social media,short texts are constantly becoming the main form of text expression on the Internet,which brings new challenges to the research of topic models.Existing topic models are often modeled based on the word co-occurrence information of the text itself,without introducing additional prior knowledge to supplement.At the same time,short text has the feature of relatively sparse word co-occurrence,which affects the accuracy of topic modeling.It is more difficult for topic model to extract high-quality theme features.In response to this problem,this article starts with the introduction of external information into the topic model,focusing on solving the problem of text sparsity and lack of prior knowledge in short text topic modeling.The main contents include:(1)Binarization of word features.According to the mean value of the word vector,the pre-trained word vector is converted into a binary label with a value of 0 or 1,and the prominent part of the word vector is retained,and the weaker part of the word vector is discarded to obtain the word feature information.(2)Bi-Concept pair construction based on Concept Net.Based on the Concept Net semantic network,the idea of constructing biterm of the BTM model is improved,and the Bi-Concept pair set is constructed to obtain conceptual information and avoid introducing a lot of noise.(3)BCTM topic model construction.Combining the binarized labels representing word feature information,calculate the weight of vocabulary affected by the word feature information under a specific topic,so that each vocabulary under the topic-word matrix generates its unique Dirichlet prior.At the same time,sampling is based on Bi-Concept to supplement the text information and cover the prior knowledge of the conceptual network,improving the accuracy of the short text topic model.Regarding the above research,this paper introduces two kinds of external information,word feature information and conceptual information,into the topic modeling process to supplement the sparse word co-occurrence and prior information of short texts.The constructed BCTM model is combined with different text units.The benchmark model is compared with subject modeling experiments.The experimental results prove that the BCTM model is better than the benchmark model in perplexity,topic consistency and text classification accuracy,which proves the effectiveness of introducing external information into short text topic modeling.
Keywords/Search Tags:Topic Model, Short Text, Concept Information, Bi-Concept, Word Feature
PDF Full Text Request
Related items