Font Size: a A A

Text clustering using latent semantic indexing

Posted on:2002-04-08Degree:M.SType:Thesis
University:The University of Texas at ArlingtonCandidate:Gee, Kevin RandallFull Text:PDF
GTID:2468390011499266Subject:Computer Science
Abstract/Summary:
One hindrance to text processing is the inherent semantic relationship between synonymous or polysemous words. Many text processing tasks do not account for these relationships and might not properly associate linguistically similar words and documents that are vastly different when individual features are directly compared. Latent semantic indexing (LSI) converts documents and terms into multi-dimensional vectors that reflect unseen, inherent relationships between those elements. The dot product of those vectors serves as an effective mechanism to measure relative distances between documents and terms.;This work combines LSI with k-medoids text clustering to accurately group documents with common conceptual meaning and semantic similarity, demonstrating the effectiveness of clustering text with features beyond simple word matching. In addition, mechanisms inherent to the LSI algorithm are exploited to effectively select relevant terms to describe the resultant text clusters.
Keywords/Search Tags:Latent semantic indexing, Text clustering, Inherent, Text processing
Related items