Font Size: a A A

The Research And Application Of The Topic Model Of Fusion Knowledge

Posted on:2018-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:T YinFull Text:PDF
GTID:2358330512476692Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of information on the Internet,the resources and data have become increasingly extensive,making it difficult for people to understand the dynamic but massive text.In order to meet this challenge,extracting key concepts from the massive text is much needed so that people can understand and deal with the text in a visual and quick way.Hence,topic model is essential to be created.The theme model algorithm is used to explore implied themes,the relationship between these topics as well as evolution over the time through analyzing the vocabulary in the original text.However,in recent years,researchers have found that these unsupervised models,which do not incorporate any human knowledge,tend to cause the topics less interpretable,that is,the models are not capable of generating semantic coherent topics.Moreover,these traditional topic models often require a large amount of training data.To solve the problems mentioned,this paper studies the knowledge-based topic model,and explores its application in topic extraction on Microblog:(1)In this paper,we designed a prior knowledge-based topic model PLTM,which extends traditional topic models through improving one of the two important probability distributions in the models-the topic-word distribution from the aspects of providing priori knowledge and automatic mining.In addition,the PLTM model is extended on-line and two online methods are proposed for short text in practical applications which is often presented in the form of data stream.(2)In the task of hot topic discovery in microblog,a hybrid method using incremental PLTM model combined with K-means-hierarchical hybrid clustering method is designed.Based on the characteristics of microblog corpus,this paper adopts an elaborate text preprocessing method to downsize the data object and reduce the noise interference.Also,it solves the sparsity problem of short text through knowledge-based topic model.The hybrid clustering algorithm will make it quicker to aggregate microblogs to corresponding topics.(3)The experimental analysis is conducted in the comment data set from Amazon.com and data set from Microblog respectively,and thereby the practicality and validity is verified.Besides,this paper designs an interaction system for users that can intuitively reflect the effect of the model in practical application.
Keywords/Search Tags:Topic Model, prior knowledge, PLTM, topic extraction, hybrid clustering
PDF Full Text Request
Related items