Font Size: a A A

A Latent Semantic Indexing Differences Model And Its Application

Posted on:2008-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:X F MiFull Text:PDF
GTID:2208360272957533Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic Text Categorization(TC) is a research hotspot at present in the domain of Information Retrieval(IR). The traditional TC methods are based on bag of words to form the feature vectors. But a huge number of terms in documents make a term space with high dimension, and then it is necessary to reduce the dimension. Feature selection and feature extraction are two alternative dimensionality reduction techniques. Latent Semantic Indexing(LSI) is one of the feature extraction methods which projects term space to latent semantics space with low dimension, and thus reduce the dimension significantly.LSI model is a popular feature reduction method in IR domain, but it is unsupervised because it overlooks the classification information in reduce dimension. Recently, there appear some improvements on LSI, e.g. Local Latent Semantic Indexing(LLSI) and Supervised Latent Semantic Indexing(SLSI) which add the classification information to latent semantic space and improve on the precision of TC.This paper proposes a new latent semantic difference model(Difference Latent Semantic Indexing, DLSI) on the base of various LSI models. It describes the principle of DLSI and compare it with some LSI models by using the SVM as a classifier. The experiments show that DLSI performs better than others common LSI model on precision in English text classification.The innovations of this thesis are:1) It proposes a new latent semantic indexing model DLSI on the base of various common LSI models.2) It compares the performance of various LSI model in English text classification, and verifies the effectiveness of DLSI.
Keywords/Search Tags:Latent semantic indexing, Difference model, Text Categorization, SVM algorithm
PDF Full Text Request
Related items