Font Size: a A A

Improved latent semantic analysis model

Posted on:2006-12-25Degree:M.SType:Thesis
University:State University of New York Institute of TechnologyCandidate:Kadam, PradnyaFull Text:PDF
GTID:2458390008472675Subject:Computer Science
Abstract/Summary:
The focus of this thesis is on a recent advance in Information Retrieval (IR), Latent Semantic Analysis (LSA). Literal matching schemas suffer from synonyms and noise in documents. LSA overcomes these problems by using statistically derived concepts instead of terms of retrieval. It uses Singular Value Decomposition (SVD) to transform high dimensional document vector into a lower-dimensional semantic vector, by projecting the former into a semantic space. Although Singular Value Decomposition (SVD) has usually been applied in LSA, this thesis is based on Semi-Discrete matrix Decomposition (SDD) which requires significantly less storage and is faster at query processing than SVD. Using Java programming language and Kolda and O'Leary's SDDPACK software, an implementation of SDD LSA is built and tested against the MEDLINE collection of biomedical abstracts. These results are compared to SVD LSA MEDLINE studies and discussed. Web interface is provided, which is hosted on Tomcat 4.1.24 (web container).
Keywords/Search Tags:LSA, Semantic, SVD
Related items