Font Size: a A A

Research And Improvement Of Latent Semantic Indexing Classification Model

Posted on:2009-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:H B MoFull Text:PDF
GTID:2178360272470838Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the past several years, a new text representation model called Latent Semantic Indexing has been put forward to process semantic side aiming at the defects of the traditional Vector Space Model. Latent Semantic Indexing is indexed by semantics not by word.In this paper, we have done some research on the model of Latent Semantic Indexing. Because of the high dimensional, loose and word based aspect of the Vector Space Model defect, we applied LSI to cut down initial word-document matrix to acquire a enriched, exact and condensate semantic space in the basis of K-Nearest Neighbor and Support Vector Machine classification algorithm. Experimental presents that the model based on LSI has only 1/20 times than the model based on VSM, feature dimension dropping from 1000 to 50, but the F1 measure is at the same. In the paper, experimental also presents that the classification based on LSI-KNN and LSI-SVM maintain a better stability and effective, though the features changed a lot due to different feature selecting algorithm.In the chapter 5, we focused on the limitation of KNN classification algorithm, providing an improvement considering center distance. The traditional KNN has a lot of disadvantages, for example, the KNN has a good effect in the circumstance of that the distance of inside of class had shorter distance and outside of class had longer distance which the training data sets distributed. But, actually, the training data sets often distributed loosely, and the training data sets were in a state of boundary and disproportionate distribution, KNN would have bad results. We have provided an improvement of KNN based on center distance. The improvement considering distribution of training data sets roundly kept away from the defects of KNN. The experiment showed Macro-F1 measure raised from 83.6 to 88.5.
Keywords/Search Tags:Feature Space, Latent Semantic Indexing, Vector Space Model, KNN, SVM
PDF Full Text Request
Related items