Research On The Methods Of Chinese Text Classification Using Bayes And Language Model

Posted on:2009-08-09

Degree:Master

Type:Thesis

Country:China

Candidate:T Yan

Full Text:PDF

GTID:2178360245486673

Subject:Computer application technology

Abstract/Summary:

With the increase of information on Internet, how to gain useful information fleetly and effectively becomes an important task, and information automatic classification emerges as the times require. Bayes has been used in many fields as one of the classification methods.This paper tries to explore the method which uses Language Model to improve Bayes' classifier.Firstly,we analyse and discuss text classfication and Bayes model.Secondly, we introduce the data sparse problem of Bayes, discuss the flaw of Laplace Smoothing,and propose that using statistic Language Model to improve the problem of data sparse.Thirdly ,we introduce three smoothing methods of Language Model: Jelinek-Mercer,Dirichlet and Absolute Discounting.Finally, we propose an improved smoothing method aiming at Jelinek-Mercer.The major work of this paper is that applying four smoothing methods of Statistics Language Model to Bayes classifier. We find the appropriate smoothing parameter through experiments eventually.Our experiment shows that the improved classifiers which used the four smoothing methods has better performance than Naive Bayes classifier model.In particular with the method Jelinek-Mercer of adopting modified smoothing scale, the performance of classifier improves a lot.

Keywords/Search Tags:

Automatic Text Classification, Naive Bayes, Data Sparce, Laplace Smoothing Method, Statistical Language Model

Related items

1	Research And Improvement On Na(?)ve Bayes Test Classifier
2	Research On Text Classification Algorithm Based On Naive Bayes Method
3	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
4	Text Categorization Based On Naive Bayes Method
5	Chinese Text Data Classification
6	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier
7	Design And Implementation Of Text Classification System Based On K-neighborhood And Naive Bayesian
8	Research On Automatic Recognition Of Uncivilized Micro-Blog Posts
9	A Text Classifier About High Blood Pressure Based On Naive Bayes
10	Research On Unbalanced Text Data Set Classification Algorithm