Font Size: a A A

Study On Text Classification Based On Finite Mixture Model

Posted on:2006-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C J WangFull Text:PDF
GTID:2168360155458066Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, the information resources on Internet increase exponentially. As a result, it is almost impossible to deal with the mass information manually. In the recent years, more and more researchers are concerning how to organize and manage the information efficiently and effectively. As one of the key technologies toward this goal, text classification is focused widely by researchers.Traditional text classification algorithms based on probability model such as Naive Bayes algorithm make assumption that document is generated from only one component. In the case of such assumption, to estimate the model parameters and model the class characteristics accurately, plenty of training samples are required indispensably. In fact, there are many factors, such as topic, writing background, writing commonplace, document style and writing habits of writers etc., to influence class characteristics.To characterize the class model comprehensively and accurately, we propose a finite mixture model based on topic model and general model to reveal the class characteristics more accurately. EM (Expectation Maximization) method is applied to estimate parameters of mixture model. Based on the mixture model proposed in this thesis, a text classification is implemented. The results of experiments show that text classification algorithm based on Finite Mixture Model is a stable algorithm and outperforms NB. Besides, text classifier based on Mixture Model performs well even though the training sample is small.Moreover, the thesis presents two applications of mixture model proposed here. (1) User profile. User profile is one of the most important modules in SmartWeb-our personalized recommendation prototype system. A good user profile can improve the recommendation result of SmartWeb. (2) Focus crawler. Text classification is the key component in focus crawler. In this thesis, we suggest to apply the text classifier based mixture model to these two applications...
Keywords/Search Tags:Data mining, text classification, mixture model, Expectation Maximization, Naive Bayes
PDF Full Text Request
Related items