Font Size: a A A

Research On Feature Selection And Classification Methods For Text Categorization

Posted on:2012-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:F B WangFull Text:PDF
GTID:2218330338462894Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As the information on network increasing rapidly, how to effectively handle and organize the text data becomes an important subject of current research. Text categorization which labels the text according to its content is one of the core issues. It has been widely used in natural language processing, information retrieval, text information filtering and other fields.In these years, great progress has been made on text categorization. However, there are still some problems needed to be further considered, e.g. feature dimension reduction and classification method, which are important issues in this research field. Motivated by this, this thesis focuses on building effective feature selection and classification methods from machine learning perspective.On feature selection, this thesis proposes a new method based on information entropy. The method defines two concepts of feature for feature filtering:Simple Entropy and Class Entropy. The method can be used in conjunction with the existing methods such as mutual information. Simple Entropy and Class Entropy are used iteratively for filtering features to get low dimension feature subsets, but obtain good classification performance.On classification method, considering the effectiveness of SVM and the good generalization of ensemble method, this thesis improves the existing SVM ensemble method. The method improves SVM training and ensemble combining the characteristic of text categorization. The method trains several SVMs in different feature space, and selects an SVM to classify a sample according to its separability in the feature space. The experiments show that this ensemble method can obtain better result than individual SVM.
Keywords/Search Tags:Text Categorization, Feature Selection, Ensemble Learning, SVM
PDF Full Text Request
Related items