Font Size: a A A

Probabilistic Text Modeling With Orthogonalized Topics

Posted on:2017-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:E P YaoFull Text:PDF
GTID:2428330590991522Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Topic models have been widely used for text analysis.Previous topics models,such as Probabilistic Latent Semantic Analysis(PLSA)and many of its variants,have enjoyed great success in mining the latent topic structure of text documents.With much effort made on endowing the resulting document-topic distributions with different motivations,however,none of these models have considered to impose any intuitions on the resulting topic-word distributions.Since topicword distribution also plays an important role in the modeling performance,topic models which emphasize only the resulting document-topic representations but pay less attention to the topic-term distributions are limited.In this paper,we address this issue by aiming to make the resulting topic-term distributions diverse.Specifically,we propose the Orthogonalized Topic Model(OTM)which imposes an orthogonality constraint on the topic-term distributions.We also propose a novel model fitting algorithm based on the generalized EM(Expectation-Maximization)algorithm and Newton-Raphson Method.Empirical results on two real world text corpora show that OTM can mine more diversified,reasonable and duplicate-free topics than several other topic models.Quantitative evaluation of text classification also demonstrates that OTM outperforms other baseline models significantly and indicates the important role played by topic orthogonalizing.
Keywords/Search Tags:Probabilistic Text Modeling, Latent Semantic Analysis, Text Classification
PDF Full Text Request
Related items