Font Size: a A A

A Dimension Reduction Method For Large-scale Text Categorization

Posted on:2011-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:H B HeFull Text:PDF
GTID:2198330338484733Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the ever-increasing volume and fast development of online information, there have been extensive studies and rapid progresses in large-scale text categorization. The large quantity of text and category induces the feature space to be high dimensionality, which brought the classification algorithm to high computational complexity and space complexity, and influences its expansibility. The effective dimension reduction to feature space can not only improve the efficiency and effectiveness of classification, but also can improve the generalization ability of classifier, and therefore the implementation of dimension reduction is essential.In this paper, we have studied the key techniques of text categorization, discussed the necessity of dimension reduction for feature space, and analyzed the classical methods of dimension reduction. The classical feature extraction algorithms can not deal with the feature space with high dimensionality, an approach to reduce dimensionality of feature space for large-scale text categorization is presented by using candid incremental principal component analysis and independent component analysis algorithm, basis of this, a combinative method of independent component analysis and information gain is adopted. Their effectiveness and feasibility are evaluated on comparative classification experiments.The experimental results shows that the iterative CCIPCA and ICA algorithms required less computational space, can effectively deal with the large-scale text categorization problems and improve the classification effect, ICA achieves the best performance among CCIPCA, ICA and ICA-IG in the same data set.
Keywords/Search Tags:large-scale text categorization, dimension reduction, feature extraction, candid incremental principal component analysis, independent component analysis
PDF Full Text Request
Related items