Font Size: a A A

Research On The Methods Of Chinese Text Classification Using Bayes And Language Model

Posted on:2009-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:T YanFull Text:PDF
GTID:2178360245486673Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increase of information on Internet, how to gain useful information fleetly and effectively becomes an important task, and information automatic classification emerges as the times require. Bayes has been used in many fields as one of the classification methods.This paper tries to explore the method which uses Language Model to improve Bayes' classifier.Firstly,we analyse and discuss text classfication and Bayes model.Secondly, we introduce the data sparse problem of Bayes, discuss the flaw of Laplace Smoothing,and propose that using statistic Language Model to improve the problem of data sparse.Thirdly ,we introduce three smoothing methods of Language Model: Jelinek-Mercer,Dirichlet and Absolute Discounting.Finally, we propose an improved smoothing method aiming at Jelinek-Mercer.The major work of this paper is that applying four smoothing methods of Statistics Language Model to Bayes classifier. We find the appropriate smoothing parameter through experiments eventually.Our experiment shows that the improved classifiers which used the four smoothing methods has better performance than Naive Bayes classifier model.In particular with the method Jelinek-Mercer of adopting modified smoothing scale, the performance of classifier improves a lot.
Keywords/Search Tags:Automatic Text Classification, Naive Bayes, Data Sparce, Laplace Smoothing Method, Statistical Language Model
PDF Full Text Request
Related items