Font Size: a A A

Research On Topic Modeling For Short Texts Based On Intented Biterms

Posted on:2022-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y B GuoFull Text:PDF
GTID:2518306569997439Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the widespread use of social media,short text has become an important kind of information carrier.The study of topic information in short texts can help people quickly get social events.Topic model is designed to extract topics from large amounts of textual data.However,the traditional topic models focus on long texts and is not suitable for topic extracting in short texts.The main reason is the data sparsity of short texts.That is to say,a single short text contains only a few words and lacks sufficient contextual information.In order to handle the data sparsity problem,various topic models for short texts have been developed.However,most of those topic models do not perform well on short texts,because they do not cope with the sparsity problem of short texts propoerly.To address such an issue,this article proposes a strategy of intended biterms.The intention of biterms here can be extended to two different perspectives:the global intention of biterms and the local intention of biterms.The global intention of biterms lies in that words with similar topic information share the same topics.Based on the global intention of biterms,this article propose a topic model for short texts called Global Intention based Topic Model(GITM).GITM introduces external auxiliary information and makes use of the intention of biterms with respect to topics to enrich the contextual information during the topic inference process.GITM smoothes the probability distribution of topics on similar words with the global intention of biterms,and thus obtains more coherent and accurate topics.However,when GITM introduces auxiliary information,there will be some noisy information.Therefore,the local intention of biterms believes that different biterms have different importance and local biterms should get more attention.Based on the local intention of biterms,this article proposes another topic model for short texts called Local Intention based Topic Model(LITM).LITM introduces auxiliary information by biterm derivation and emphasizes the importance of different biterms by using the local coherency of bitems.The local intention of biterms help LITM obtain high quality document-topic distributions.In order to validate the two proposed topic models,this article will develop a multiangle comparison experiment on several real-world short text datasets.The experiments are divided into three parts.The first part is a topic coherence comparison experiment designed to examine the quality of topics;the second part is a text classification experiment designed to examine the accuracy of the document-topic distributions;and the third part is a parameter sensitivity experiment to examine the influence of several important parameters on GITM model and LITM model.The experimental results show that both GITM and LITM perform significantly better than the comparison methods on both topic coherence and text classification experiments.Therefore,the two topic models based on intented bitems GITM and LITM are able to achieve a better topic extracting job.
Keywords/Search Tags:topic model, short texts, intended biterms, Gibbs sampling
PDF Full Text Request
Related items