Research On Manifold Learning Based On The Text Classification

Posted on:2013-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:S J Lian

Full Text:PDF

GTID:2248330392450549

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Manifold learning plays an important method in pattern recognition. In the fieldof pattern recognition, the high-dimensional data is too large and sparse for thetraditional classification to identify. Therefore, we can use the effectivedimensionality reduction to get the nature of low-dimensional features, and thenidentify the low-dimensional data to address these problems. Dimensionalityreduction for high dimensional data develops from the linear to the nonlineardimensionality reduction method. The traditional linear dimension reduction methodis effective to the data with a linear characteristic, but powerless to the datadistribution on the nonlinear manifold. The manifold learning algorithm is effectiveto the nonlinear data dimensionality With the manifold learning, we can get thelow-dimensional data, and the identify in the low-dimension space.As the first step in the manifold learning algorithm, the selection of theneighborhood set plays an important impact in the algorithm. The traditionalmanifold learning algorithm uses the Euclidean distance to construct theneighborhood of the sample set. It is often unable to find the accurate neighborhoodset of the large sparse text data. Address this shortcoming, this paper uses thecategory information in text classification, proposes a weighted Euclidean distancemetric to construct the neighborhood of the text data. With this method, the sametype of data is closer, and the category type of data is alienated. With this method, itis better to dig out the nature of information and enhance the text data classificationresults.The semi-supervised manifold learning algorithm is a hot research direction inrecent years. This semi-supervised manifold learning algorithm combines thetraditional unsupervised manifold learning algorithm with a semi-supervisedmachine learning. This method uses samples of known low-dimensional informationas monitoring information, to get the tested data’s low-dimensional informationthrough the linear mapping. The manifold learning algorithm are made classifier, notonly to retain the original manifold learning algorithm to calculate fewer parameters,calculation speed and so on, but also in the text classification problems, because theclass information in the training sample is easy to obtain, use these categoriesinformation as the supervised information, through linear mapping is obtained testdata of low dimensional coordinate, to make the same category data as much aspossible " near", different classes of data as much as possible " alienation", so as toimproving the classification of the text data rate.Therefore, we use the characteristics of the text classification data, and use the prior information of the training data, and then propose a based on the selection ofneighborhood manifold learning algorithm. This method uses the weightedEuclidean distance to select the sample’s neighborhood. At the same time, we use theadvantages of semi-supervised manifold learning algorithm, and use the categoryinformation of training data to construct classifier to classify text data. Finally, weillustrate the effectiveness of the improved manifold learning algorithm based on thetext classification through a series of experiments.

Keywords/Search Tags:

text classification, classificatory, informationreconstruction weights, common words, attribute words

PDF Full Text Request

Related items

1	Research On Extraction Methods Of Kazakh Common-used Words And Investigation Of Elementary School Textbooks' Words
2	Research On Stop Words And Feature Selection For Text Classification
3	Research On Dataless Text Classification With Seed Words: A Supervised Topic Modeling Approach
4	Research On The Visual Word Reduction Method Based On Binary Discernibility Matrix
5	Research On Classification Method On Chinese Short Texts With Few Words Based On Feature Representation
6	The Implementation Regarding The Location And Recognition Of The Words In Scene Images Based On Embedded Platform
7	The Text Categorization And Structure Of Theme Words Network Based On Topic Models
8	Research On Fine-grained Sentiment Polarity Classification Of Chinese Network Consumption Review
9	Studies On Text Content Indexing: Based On Key Phrase
10	Research On Objects Classification Method In Using Saliency Detection And Bag Of Words Model