Font Size: a A A

Folding-up: A hybrid method for updating the partial singular value decomposition in latent semantic indexing

Posted on:2007-11-03Degree:M.C.ScType:Thesis
University:Dalhousie University (Canada)Candidate:Tougas, Jane Elizabeth BaileyFull Text:PDF
GTID:2448390005466542Subject:Computer Science
Abstract/Summary:
The tremendous size of the Internet and modern databases has made efficient searching and information retrieval (IR) an area of utmost importance. Latent semantic indexing (LSI) is an IR method that represents a dataset as a term-document matrix. LSI uses a matrix factorization method known as the partial singular value decomposition (PSVD) to reduce noise in the data by projecting the term-document matrix into a lower-dimensional vector space. Calculating the PSVD of a large term-document matrix is computationally expensive. In a rapidly expanding environment, a term-document matrix is altered often as new documents and terms are added. Recomputing the PSVD of the term-document matrix each time these slight alterations occur can be prohibitively expensive.; Folding-in is one method of adding new documents or terms to a term-document matrix; updating the PSVD of the existing term-document matrix is another method. The folding-in method is computationally inexpensive, but it may cause deterioration in the accuracy of the PSVD. The PSVD-updating method is computationally more expensive than the folding-in method, but it better maintains the accuracy of the PSVD. This thesis introduces folding-up, a new method which combines folding-in and PSVD-updating. Folding-up offers a significant improvement in computation time when compared with either recomputing the PSVD, or PSVD-updating, while avoiding the degradation in the PSVD that can occur when the folding-in method is used on its own.
Keywords/Search Tags:Method, PSVD, Term-document matrix, Folding-up
Related items