Font Size: a A A

Application Of Sparse Linear Discriminant Analysis On Text Classification

Posted on:2012-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:L H CengFull Text:PDF
GTID:2218330362453076Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text documents are both multi-category and high-dimensional data, and its dimension usually reaches over hundreds and thousands of features. By contrast, the text document has less sample sizes, and the sample sizes are much smaller than the data dimensionality for undersampled problems. Effective dimensinoaliy reduction could make the learning task more efficient and more accurate in text classification. Multi-category data processed by linear discriminant analysis, often makes the within class data more intensive and the between class data farther away, so we can get good classification results. The classical LDA formulation fails when the within-class scatter matrix is singular, encountered usually in undersampled problems. Therefore, we need to improve the traditional linear discriminant analysis.We propose a method which performs linear discriminant analysis with a sparseness criterion imposed such that the classification, feature selection and dimension reduction is merged into one analysis. The sparse linear discriminant analysis can well deal with the undersampled problems, for it can remove all the useless information. The sparse discriminant analysis is faster than traditional feature selection methods based on computationally heavy criterria, and the results are better with regards to classification rates and sparseness. In order to verify the effectiveness of sparse linear discriminant analysis and its advantages, we have also introduced local linear discriminant analysis, kernel local linear discriminant analysis, semi-supervised local linear discriminant analysis approach to do experiment. With the same text documents in the data base, and with the sparse linear discriminant analysis to reduce the dimensions directly, and the other approaches are used, following with kernel principal component analysis to reduce data dimensionality, then using the same text classifier to comparison of experimental results. The experimental results show that the sparse linear discriminant analysis can effectively reduce the dimensions and its classification result is better than other methods.
Keywords/Search Tags:linear discriminant analysis, sparse linear discriminant analysis, undersampled problems, kernel principal component analysis
PDF Full Text Request
Related items