Font Size: a A A

Supervised Topic Model

Posted on:2011-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:J C GuoFull Text:PDF
GTID:2178360308952439Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Topic Model is a kind of Graph Model that is popular among academics. Itstrictly following Bayesian probabilistic frame work, and is a kind of completeBayesian Model. Compared with other model, being a generative model, Topicmodel has features like being able to utilize the existing internet data, learningtopic that are human interpretable, and ?nding the latent sematic meaning ofthe documents, and is good dimension reduction tool. However, being applied toclassi?cation scenario, the topics it learnt might not be good for classification, asit's unsupervised learning method. So how to e?ective incorporate supervisoryinformation into the topic models is hot topic.In this paper, we study the various methods of integrating label informationinto the topic models. we ?rst developed a"upstream"supervised topic modelfor multi-class text categorization, by which we can simultaneously perform doc-ument modeling and categorization. Compared with existing supervised topicmodels, this model has three advantages: 1) Categories are explicitly modeled asdistributions over topics, which is equivalent to enforce a strong category speci-?ed prior to documents. 2) Each document is clearly decomposed to three partswith di?erent functionalities in categorization. 3) Inference results about docu-ment labels are sparse, which are necessary for categorization. We applied themodel to both text classification and image categorization.In latter chapter, to address the problem that"upstream"model can not ef-fectively utilized inter-class information among samples, we proposed a new kindof model called"LogisticLDA", which mathematically integrates a generativemodel and a discriminative model in a principled way. By maximizing the poste-rior of document labels using logistic normal distributions, the model effectivelyincorporates the supervisory information to maximize inter-class distance in thetopic space, while documents still enjoy the interchangeability characteristic forease of inference. Experimental results on three benchmark datasets demonstrate that the model outperforms state-of-the-art supervised topic models. Comparedwith support vector machine, the model also achieves comparable performance,but meanwhile it discovers a topic space, which is valuable for dimension reduc-tion, topic mining and document retrieval.
Keywords/Search Tags:Supervised Learning, Image Categorization, Text Classification, Graph Model
PDF Full Text Request
Related items