Study On Text Classification Based On Finite Mixture Model

Posted on:2006-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:C J Wang

Full Text:PDF

GTID:2168360155458066

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, the information resources on Internet increase exponentially. As a result, it is almost impossible to deal with the mass information manually. In the recent years, more and more researchers are concerning how to organize and manage the information efficiently and effectively. As one of the key technologies toward this goal, text classification is focused widely by researchers.Traditional text classification algorithms based on probability model such as Naive Bayes algorithm make assumption that document is generated from only one component. In the case of such assumption, to estimate the model parameters and model the class characteristics accurately, plenty of training samples are required indispensably. In fact, there are many factors, such as topic, writing background, writing commonplace, document style and writing habits of writers etc., to influence class characteristics.To characterize the class model comprehensively and accurately, we propose a finite mixture model based on topic model and general model to reveal the class characteristics more accurately. EM (Expectation Maximization) method is applied to estimate parameters of mixture model. Based on the mixture model proposed in this thesis, a text classification is implemented. The results of experiments show that text classification algorithm based on Finite Mixture Model is a stable algorithm and outperforms NB. Besides, text classifier based on Mixture Model performs well even though the training sample is small.Moreover, the thesis presents two applications of mixture model proposed here. (1) User profile. User profile is one of the most important modules in SmartWeb-our personalized recommendation prototype system. A good user profile can improve the recommendation result of SmartWeb. (2) Focus crawler. Text classification is the key component in focus crawler. In this thesis, we suggest to apply the text classifier based mixture model to these two applications...

Keywords/Search Tags:

Data mining, text classification, mixture model, Expectation Maximization, Naive Bayes

PDF Full Text Request

Related items

1	Data Mining Systems And Their Applications - Improve The Performance Of The Naive Bayes Text Classifier, Associated Characteristics
2	Research Of Tuberculosis Detection In Sputum Smear Images Based On Color And Shape
3	Research And Application On Naive Bayes Classification Algorithm
4	Text Classification Method Based On Unsupervised Clustering And Naive Bayesian Classifier
5	Research On The Methods Of Chinese Text Classification Using Bayes And Language Model
6	Research On Text Classification Algorithm Based On Naive Bayes Method
7	Research And Application On The Technology Of Web Text Mining
8	Research Of Chinese Text Classification Based On Naive Bayesian Method And Application Of Microblogging Data Classification
9	Text Categorization Based On Naive Bayes Method
10	Research On Text Mining Based On MapReduce