Font Size: a A A

The Research On Latent Semantic Classification Model

Posted on:2006-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X Q CengFull Text:PDF
GTID:2168360152982873Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the most effective text information management method, Automated Text Categorization (TC) can help people mining the electronic text more quickly and easily. Nowadays, TC is a research hotspot in the domain of Information Retrieval (IR), and more and more scholars are interested in the TC problem.Latent Semantic Indexing (LSI) model is a popular feature reduction method in IR domain. As texts are represented by the latent semantic variables instead of the original terms, LSI model exceed Vector Space Model (VSM) significantly.However, accompanied with the reduction of feature size, LSI model will lose some crucial information for classification. As we know, the LSI representation can capture the most important global principal components of text collection. But when LSI is used in classification, some important features may be ignored because of the small values of their corresponding eigenvalue.To solve this problem, we propose a new text classification model: Latent Semantic Classification (LSC) model by extending LSI model. The model principle and the results of the influence of the feature dimension on the performance of LSC model are described in this thesis. In addition, we compare LSC model with some common classification models, and the experiments show that LSC model performs better than the existing classification methods such as kNN and SVM.The main creative points of this thesis are:1) By extending LSI model, we propose a new text classification model: Latent Semantic Classification (LSC) model;2) We study the performance of LSC model on English and Chinese corpus respectively, analyze the LSC model's stability, and compare LSC model with some common classification models.
Keywords/Search Tags:Text Classification, Latent Semantic Indexing, Latent Semantic Classification, Partial Least Square Regression
PDF Full Text Request
Related items