Font Size: a A A

Research On Text Categorization Based On PCA And Multi-view Learning

Posted on:2011-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Q MengFull Text:PDF
GTID:2178360308954199Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the coming of the information period, the electronic text information expands rapidly. How to effectively organize and manage the information and help the users to select the information that they need accurately, comprehensively and quickly become a great challenge of the field of information science. The technology of text categorization as one of the key technologies has solved the problem of the information chaos. The technology of text categorization can be used for information retrieval, search engines, text databases, text information filtering, digital libraries and other fields.The major problem of text categorization is the dimension of the vector space model is too high that causes computational complexity of the classification algorithms. It needs to use feature selection methods to select the features. The commonly used feature selection methods include Information Gain, Mutual Information,χ2 statistics, Expected Cross Entropy, Word Frequency method, Document Frequency method, Text Evidence and so on. This paper carries out the study from the variety of different feature selection functions.The major work of this paper includes:1. This paper uses the PCA (Principal Component Analysis) method after the feature selection functions. It further reduces the feature dimension and can select the more representative features. Experiments show that after using PCA the performance of each classifier improves obviously.2. Facing for the differences of the feature subsets based on PCA, this paper proposes an improved multi-view learning strategy. Combine PCA with Multi-view learning strategy using in text categorization can prove its feasibility.
Keywords/Search Tags:Feature selection, Principal Component Analysis, Multi-view learning strategy Ensemble, Feature dimension
PDF Full Text Request
Related items