Font Size: a A A

Hierarchical Topic Model Based On Document Frequency

Posted on:2012-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2178330335460291Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
To extract the semantic structure in text collection, a variety of unsupervised approaches has been proposed. In the context of a general "bag of words" assumption, documents turn out to be vectors containing counts of terms in them. After such a process, a sophisticated statistical framework has been created successfully by topic model, following a line of work which continuously improving the model structure.Statistical topic models are attractive because they allow for a rapid analysis and understanding of new collections of text. However, this framework cannot provide sufficient information for the problem of learning a topic hierarchy from data. It has been shown recently that the data-driven learning approaches combined with some structure and prior knowledge can be a satisfactory solution. In this paper, we review a new probabilistic framework which adds the hierarchical information within document frequency into topics to seek the more semantic structure. The hierarchical topics created by DF topic model have a natural relationship beyond the tree structure. I illustrate our approach on 20 Newsgroups to show the performance of our model in extracting hierarchy of topics.From a cognitive science perspective, the background knowledge is an important supplementary means of getting hierarchical topics. And a lot of previous work has been developed by adding side information in analyzing text data. We follow this idea in a different way. That's because document frequency comes from basic data itself. So my work is also an unsupervised learning. Finally, by the combination of DF and statistical learning processes, I want this human-interpretable decomposition of the texts to be more semantic.
Keywords/Search Tags:Graphical Model, Topic Model, Hierarchy, Semantic
PDF Full Text Request
Related items