Font Size: a A A

Automated Text Classification Model Based On Projection Pursuit Regression

Posted on:2006-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiaoFull Text:PDF
GTID:2168360152482864Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In general, text is represented by vector model, which is a high dimensionality feature space. Using this high dimensional vector in text classification will raise the curse of dimensionality, so we should use dimensional reduction to avoid this problem. And the research on technology of dimensional reduction is one of the difficult, interesting problems of text classification.Most text classifications reduce dimensionality by using feature selection or feature extraction. Those techniques have an assumption of normal distribution. Text data does not satisfy the assumption of normality that those methods are based on, so we need a Robust or nonparametric method to cope with this problem. Projection Pursuit technique is an emerging statistical method that is used in high dimensional data analysis, particularly in analyzing data that does not satisfy normal distribution and nonlinearity. In PPR the data does not satisfy the assumption of normal distribution; it is able to ignore irrelevant (i.e. noisy and information-poor) variables, so it can make full use of the information of high dimensional data.We propose a Text classification model based on Projection Pursuit Regression. The main idea is to project the data of text that have been represented by vector model from a high dimensional space to a lower dimensional subspace, find projection directions that can reflect the construction and feature of the high dimensional data, and then project the text to these directions. We use ridge function to fitting the data, by selecting the most projection directions repetitively, this method decrease the dimension of high-dimensional data by increasing the number of ridge function, at last, use general text classification algorithm to classify.We also do some experiment by using SVM method,KNN method and some other methods. The result of experiment shows that Projection Pursuit Regression has fine recall and precision. The Projection Pursuit Regression is a feasible effective method of text classification.The main creatives of this paper are:(1) Proposing PPR model for automated text classification.(2) In many fitting ridge functions, selecting Hermite orthogonal polynomial to fit ridge function.
Keywords/Search Tags:Projection Pursuit Regression, Text Classification, Dimension Reduction, Genetic Algorithm
PDF Full Text Request
Related items