Font Size: a A A

Research And Implementation Of Chinese Text Classification, Feature Selection Method,

Posted on:2011-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y B PeiFull Text:PDF
GTID:2208360305459489Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Feature selection plays an important role in the Chinese text categorization. Domestic existing feature selection methods mainly focus on the results of the class distribution evenly corpus of the feature selection method. This thesis, in allusion to the problem of the efficiency declining significantly on the class distributed unevenly corpus for some features selection methods in Chinese text categorization, based on domestic existing research, analyses and studies the factors affecting the classification effect and the improved programs are put forward. On this basis, this thesis designed and implemented a classification system for Chinese text categorization. Works achieved in this paper are as follow:1) In allusion to the decreased effect situation in traditional information gain feature selection method on class distributed heterogeneous corpus, this paper analysis and points out the factors that impact the classification effect of information gain feature selection method. Based on the traditional information gain method, while removing the contributions of the terms not exits in the method and adding concentration, dispersion to the feature selection, the thesis improves the effectiveness of text classification. For further analysis and research of the improved method, in turn the improved method is introduced to the term weight adjustment techniques.2) In allusion to the disadvantage of the traditional Chi-square statistic feature selection method relying heavily on low-frequency words, this paper analysis the reasons of that. Based on traditional feature selection method, this thesis removes the negative correlation situation between terms and categories. For further analysis and research of the improved method, in turn the improved method is introduced to the term weight adjustment techniques. At the same time, combining existing domestic Chi-square statistic feature selection methods, introducing the concentration, dispersion, frequency into the improved method, this paper improves the classification performance of the method.3) In order to test and validate the performance of the improved classification methods and provide a platform for further research on Chinese text classification, this thesis designed and developed a Chinese text classification system.4) To further identify and explore the problems and laws of feature words weight adjustment in Chinese text classification method, this thesis made experiment and conclusion on the classification results of weight adjustment method with different classifiers and corpus in the developed Chinese text classification systems.
Keywords/Search Tags:Text Categorization, Feature Selection, Weight Adjustment, Information Gain, Chi-square statistic
PDF Full Text Request
Related items