Font Size: a A A

The Study Of Chinese Text Categorization Based On Na(?)ve Bayes

Posted on:2012-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:D LiFull Text:PDF
GTID:2178330338994929Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Since the technology of computer and network appeared, it had been developed very rapidly. Network has becoming one of the most mainly-used information source. Because most of the information in the network is text data type, automatic text categorization which is the important basic of effective organization and management text data has become an important study field. Naive Bayes classification method is based on the Bayesian theory,which is accepted as simple and effective probability classification method and has become one of the important contents in the text categorization.Firstly, the paper studies key technologies of the text categorization that includes Chinese text segmentation, representation of text vector and feature weighting. After that, Naive Bayes text classification model and the affect of feature selection method on performance of Naive Bayes text classification is studied. At last, java on MyEclipse to realize Chinese text categorization system based on Naive Bayes method is accomplished.This paper mainly analyzes Multi-variate Bernoulli Model and Multinomial Model. By experiment, the effect of Multinomial Model is better than Multi-variate Bernoulli Model in the Chinese text categorization. In order to increase classification accuracy,smoothing factor of Multinomial Model is improved. The experiment shows excellent classification performance. Due to Naive Bayes text classification model based on conditional independence assumptions of attributes, feature selection is important to classification accuracy. By means of the experiments, the paper shows information gain andĪ‡2statistic value are the preferably feature selection methods of Naive Bayes text classification.
Keywords/Search Tags:Text Categorization, Naive bayes classification, Multi-variate Bernoulli Model, Multinomial Model, Feature Selection
PDF Full Text Request
Related items