Font Size: a A A

Research On Manifold Learning Based On The Text Classification

Posted on:2013-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:S J LianFull Text:PDF
GTID:2248330392450549Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Manifold learning plays an important method in pattern recognition. In the fieldof pattern recognition, the high-dimensional data is too large and sparse for thetraditional classification to identify. Therefore, we can use the effectivedimensionality reduction to get the nature of low-dimensional features, and thenidentify the low-dimensional data to address these problems. Dimensionalityreduction for high dimensional data develops from the linear to the nonlineardimensionality reduction method. The traditional linear dimension reduction methodis effective to the data with a linear characteristic, but powerless to the datadistribution on the nonlinear manifold. The manifold learning algorithm is effectiveto the nonlinear data dimensionality With the manifold learning, we can get thelow-dimensional data, and the identify in the low-dimension space.As the first step in the manifold learning algorithm, the selection of theneighborhood set plays an important impact in the algorithm. The traditionalmanifold learning algorithm uses the Euclidean distance to construct theneighborhood of the sample set. It is often unable to find the accurate neighborhoodset of the large sparse text data. Address this shortcoming, this paper uses thecategory information in text classification, proposes a weighted Euclidean distancemetric to construct the neighborhood of the text data. With this method, the sametype of data is closer, and the category type of data is alienated. With this method, itis better to dig out the nature of information and enhance the text data classificationresults.The semi-supervised manifold learning algorithm is a hot research direction inrecent years. This semi-supervised manifold learning algorithm combines thetraditional unsupervised manifold learning algorithm with a semi-supervisedmachine learning. This method uses samples of known low-dimensional informationas monitoring information, to get the tested data’s low-dimensional informationthrough the linear mapping. The manifold learning algorithm are made classifier, notonly to retain the original manifold learning algorithm to calculate fewer parameters,calculation speed and so on, but also in the text classification problems, because theclass information in the training sample is easy to obtain, use these categoriesinformation as the supervised information, through linear mapping is obtained testdata of low dimensional coordinate, to make the same category data as much aspossible " near", different classes of data as much as possible " alienation", so as toimproving the classification of the text data rate.Therefore, we use the characteristics of the text classification data, and use the prior information of the training data, and then propose a based on the selection ofneighborhood manifold learning algorithm. This method uses the weightedEuclidean distance to select the sample’s neighborhood. At the same time, we use theadvantages of semi-supervised manifold learning algorithm, and use the categoryinformation of training data to construct classifier to classify text data. Finally, weillustrate the effectiveness of the improved manifold learning algorithm based on thetext classification through a series of experiments.
Keywords/Search Tags:text classification, classificatory, informationreconstruction weights, common words, attribute words
PDF Full Text Request
Related items