Font Size: a A A

Application Of Locally Linear Embedding In Text Classification

Posted on:2008-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:C E LiFull Text:PDF
GTID:2178360245478296Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The data of real world is usually of high-dimensional data, which is difficulty to understand, present and process for its high dimensions. So it is faced with two puzzles. The first one is the curses of dimensionality which has challenged the pattern recognition and discovering formulas on high-dimensional data. The second is the blessings of dimensionality which shows that the abundance information of the high-dimensional data set means the new feasibility. How to express the high-dimensional data in the low-dimensional space and discover the intrinsic structure is the pivotal problem of high-dimensional information processing.Text classification is facing the same problem, there are thousands of features; even more than the number of documents. However, it's very difficult to evaluate the statistical aracteristics of samples because of the high dimensions. It will lead to "over study" and reduce classifiers' performance. So that how to select features that represent the documents well is quite necessary. Effective dimensionality reduction could make the earning task more efficient and more accurate in text classification.In this paper,the procedure of the locally linear embedding(LLE) algorithm is studied and applied in the text classification.Texts have been represented to vector by the vector space model(VSM).After feature reduction,we get data set in lower dimension,then reduce dimensionality and get much lower dimension by LLE.Have trained the classifying machine by training sample,and testing sample has been classified on the classifying machine.Classifying machine based on the support vector machine is chosen. it does not require an iterative algorithm, and just a few parameters need to be set, what's more, it perform very well on high-dimensional data of face data sets. However, the algorithm is sensitive to two parameters that should be set artificially, which is seldom researched., especially to get reliable estimators of embedding dimension still remains as a open problem. So in this paper,the result in the different number of neighbor points and intrinsic dimensionality has been compared to get the best condition.
Keywords/Search Tags:locally linear embedding, text classification, intrinsic dimensionality, vector space model
PDF Full Text Request
Related items