A Dimension Reduction Method For Large-scale TExt Categorization

Posted on:2011-08-16

Degree:Master

Type:Thesis

Country:China

Candidate:H B He

Full Text:PDF

GTID:2178330338484733

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the ever-increasing volume and fast development of online information, there have been extensive studies and rapid progresses in large-scale text categorization. The large quantity of text and category induces the feature space to be high dimensionality, which brought the classification algorithm to high computational complexity and space complexity, and influences its expansibility. The effective dimension reduction to feature space can not only improve the efficiency and effectiveness of classification, but also can improve the generalization ability of classifier, and therefore the implementation of dimension reduction is essential.In this paper, we have studied the key techniques of text categorization, discussed the necessity of dimension reduction for feature space, and analyzed the classical methods of dimension reduction. The classical feature extraction algorithms can not deal with the feature space with high dimensionality, an approach to reduce dimensionality of feature space for large-scale text categorization is presented by using candid incremental principal component analysis and independent component analysis algorithm, basis of this, a combinative method of independent component analysis and information gain is adopted. Their effectiveness and feasibility are evaluated on comparative classification experiments.The experimental results shows that the iterative CCIPCA and ICA algorithms required less computational space, can effectively deal with the large-scale text categorization problems and improve the classification effect, ICA achieves the best performance among CCIPCA, ICA and ICA-IG in the same data set.

Keywords/Search Tags:

large-scale text categorization, dimension reduction, feature extraction, candid incremental principal component analysis, independent component analysis

PDF Full Text Request

Related items

1	A Dimension Reduction Method For Large-scale TExt Categorization
2	Application Of PCA Dimensionality Reduction Method Based On Latent Variables In Text Classification Problems
3	Image Feature Extraction Technology Based On Incremental Two-Dimensional Principal Component Analysis
4	Design of a face recognition system using incremental principal component and independent component analysis (IPCA-ICA) methods
5	Research On Text Categorization Based On Kernel PCA And RBF Neural Network
6	Automatic Patten Recognition Of ECG Signals Based On Independent Component Analysis Feature Extraction
7	Dimension Reduction Technology Research Based On Text Features
8	The Construction Of Resilient Functions And The Research Of Intelligentized Ids Methods
9	Research On Appearance-based Statistical Face Recognition
10	Speckle Reduction, Based On Independent Component Analysis Of Remote Sensing Image