Evaluating the performance of latent semantic indexing

Posted on:2006-12-04

Degree:Ph.D

Type:Dissertation

University:University of Colorado at Boulder

Candidate:Suwannajan, Pakinee

Full Text:PDF

GTID:1458390005993080

Subject:Computer Science

Abstract/Summary:

Information Retrieval (IR) has emerged in various fields such as the Web, bibliography systems, and digital libraries. Data indexing and retrieval are parts of IR and have been of interest to computer information scientists in the past years. One of the most popular IR models is the vector space model. It was developed to solve many problems associated with exact lexical matching. The vector space model employs linear algebra tools to find the similarity between a document and a query. Latent Semantic Indexing (LSI), a widely used variant of the vector space model, was designed to overcome problems arising from synonymy and polysemy. It is often claimed in the literature that LSI outperforms the vector space model. We discovered that LSI's performance is better than that of the vector space model only in some cases, specifically when the amount of information that a query shares with the relevant documents is greater than the amount that that query shares with the non-relevant documents. We also studied the capability of LSI in solving synonymy and polysemy problems. While synonyms are words that have the same meaning, a polyseme is a single word that has multiple meanings. We discovered that LSI can distinguish between two synonymous words only when they both appear in the same or similar contexts. For polysemy, LSI outperforms the vector space model only when two contexts that use different meanings of a polyseme share at least some information.

Keywords/Search Tags:

Vector space model, Information, LSI

Related items

1	Web Page Information Filtering Method Research Based On Vector Space Model
2	Research And Implementation On Chinese Information Retrieval System Based On Structured Vector Space Model
3	Study Of An Information Retrieval Technology Based On Improved Vector Space Model
4	The Semantic Information Retrieval Research Based On Multilayer Vector Space Model
5	Improved Vector Space Model And Its Application To Document Classification System
6	Research, Key Technology For Information Filtering Based On Vector Space
7	Correlation Retrieval Of Criminal Transcript Information Based On Vector Space Model
8	Research On The Chinese Science And Technology Document Information Retrieval System Based On The Vector Space
9	The Research Of Information Retrieval Algorithm In Vector Space
10	Combining Vector Space Model And Language Model To Information Retrieval