Font Size: a A A

Probabilistic Latent Semantic Analysis And Applications

Posted on:2012-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2178330332978385Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Many of the applications related to information retrieval rely on discovering the hidden meanings behind the text itself. However, due to the existence of polysemy and synonym, the match of queries may not be accurate on literal terms. Probabilistic Latent Semantic Analysis is a topic modeling technique to discover the hidden structure by building the relation between observed data and the assumed hidden variables, which is "document-topic-term" for text corpus. It uses a statistical learning technique to estimate the model parameters, including the multinomial distribution of the terms belonging to a topic, and the multinomial distribution of the topics given a document. The documents are represented in a semantic space instead of the term space, so that matching, ranking and relevance can be done more accurately. This paper contributes on the following aspects:We present an efficient approach that provides direct control over sparsity during the expectation maximization process. Which resolved the problem that PLSA can not produce local features and the over fitting problem. Experiments on face databases are reported to show visual representations on obtaining local features, and detailed improvements in clustering tasks compared with the original processWe designed the multithread PLSA training process in distributed systems under the MPI and the MapReduce framework, many details have been discussed for implementations, and evaluations have been analyzed for pros and cons.We proposed a method for RSS document ranking problem, using implicit feedback of reading time for user preference modeling.
Keywords/Search Tags:Topic Model, Sparse Representation, PLSA, Distributed System, Matrix Factorization
PDF Full Text Request
Related items