Font Size: a A A

Research On Topic Evolution With Seed Document&Statistical Model

Posted on:2015-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:S Z QiaoFull Text:PDF
GTID:2268330431954528Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Topic evolution refers to the change process of topic emerging, rising and vanishing according to the inner topic similarity in different time slice. This paper presents two improvement methods of finding the topic information in text which are based on the seed document and OLDA models. The methods use the cosine measure and the relative entropy to compute the topic similarity, use the topic similarity and time sequences to analyze the correlation of topics, and use the topic content and topic strength to describe the evolution process of topics. Discussing the topic evolution has significance in practice and in theory.Firstly, by analyzing the existing topic models we present an improvement method of finding the semantic information, or the hidden topic information, based on OLDA model. The statistical information of words can represent the appearance information of topic evolution; the LDA and OLDA models can dig out the hidden information. The existing methods show that using LDA model can improve the effect and accuracy of topic evolution. On the basis of LDA, OLDA model takes the word-topic posterior probability in previous time slice as the word-topic prior probability in current time slice, which will benefit to keep the continuity of topics in timeline and improve the effect of digging out the topic information. On the basis of OLDA, the paper presents an improvement method to dig out the topic information. The contrast experiments show that the topic evolution model can help find the topic information.Next, by analyzing the relation between the topic evolution and the seed document we present an improvement method of topic evolution based on the seed document. The seed document is a representative one in timeline, and the topics can be regarded as a series of events related to the seed document. The hot topics may suddenly become the cold ones in the gradual process of topic emerging, rising and vanishing since the topic evolution may have sudden change. This paper links the seed documents in last time slice into the documents in current time slice to strengthen the topic in current time slice so as to weaken the effect of noise information and smooth the sudden change.The paper presents two improvement methods of topic evolution based on the seed document and OLDA models by analyzing the existing topic models and the seed document. The seed document is used to keep the continuity of topic in content and weaken the effect of noisy information. The OLDA model is used to keep the continuity of topic in timeline. The contrast experiments show that the effect of topic evolution has an improvement to some extent.
Keywords/Search Tags:Topic Evolution, LDA model, OLDA model, Seed Document
PDF Full Text Request
Related items